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Introduction 


The purpose of this book is to provide back- 
ground material for teachers and students of 
APL. In a course on APL the focus is neces- 
sarily on the details of the language and its 
use; it may not always be apparent what the 
purpose of a particular rule might be, nor how 
one piece of the language relates to the whole. 
This book is a collection of articles that deal 
with the more fundamental issues of the lan- 
guage. They appeared in widely scattered 
sources, Over a period of many years, and are 
not always easy to find. They are arranged in 
the order of their appearance, so It 1s possible 
to get a sense of the development of the lan- 
guage from reading the articles in sequence. 
The first article, Formalism in Program- 
ming Languages, appeared before there was 
an Implementation. The reader who knows 
only contemporary APL will have to master 
some differences in notation in order to under- 
stand it. The effort will be repaid, however, 
because it condenses in a very small space 
some information on the properties of the sca- 
lar functions which appears nowhere else. In 
the discussion following the paper, R.A. 
Brooker asks a key question, one which has 
followed APL through its development: 


Why do vou insist on using a notation 
which is a nightmare for typist and com- 
positor and impossible to implement with 
punching and printing equipment cur- 
rently available? What proposals have 
you got for overcoming this difficulty? 


The question had no good answer at the 
time. The best that had been proposed in- 


volved transliteration rules that would have 
made it very difficult to work with the lan- 
guage. It was not until the advent of IBM’s 
Selectric typewriter, with its replaceable print- 
ing element, that 1t became possible to think of 
developing a special APL printing element. 
Jean Sammet dismissed the paper in her review 
of it two years later by writing, “as soon as 
[the author] starts to defend the work on the 
grounds that it 1s currently practical, he is on 
very weak grounds.” By the time the review 
appeared, however, the very impractical nota- 
tion had found its implementers, and I read the 
review as I was sitting at a terminal connected 
to a 7090 system which was the time-sharing 
host for something called IVSYS, the im- 
mediate precursor of what would be called 
APL. 

The second paper is connected with the 
transition from a pure notation to an 1m- 
plemented programming language. When it 
was written, although implementations had 
begun to appear, and the APL printing element 
had been developed, it was still not clear what 
was the best way to publish the language. In 
the book, as you can see from the selection, 
use was still made of boldface and italic type 
styles, rather than the single font imposed by 
the printing element. In the answer book, how- 
ever, the functions were displayed in both the 
old style and the new, so that the user could 
easily see how to translate between the two. 

In the third selection, Algebra as a Lan- 
guage, the case 1s made for the superiority of 
APL notation over those of conventional arith- 
metic and algebra. It also gives a discussion of 
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the analogies between teaching a mathematical 
notation and teaching a natural language, a 
note that will be heard again in the last selec- 
tion. The paper makes clear that there is a 
larger purpose to APL than merely to give 
people something in which to program. What 
is intended is a thorough reform of the way 
mathematics is taught, given the existence of 
the computer. 


The next two papers form a pair and can be 
discussed together. In the first, The Design of 
APL, Falkoff and Iverson give the reasons for 
many of the design decisions that went into 
APL. The occasion for the second paper, The 
Evolution of APL, was a conference on the 
history of programming languages. The criteria 
for a language to be represented at this confer- 
ence were that it 1) was created and in use by 
1967; 2) that it still be in use by 1977; and 3) 
that it had considerably influenced the field of 
computing. In the introduction to the proceed- 
ings, APL was described as follows: 


This language has received widespread 
use in the past few years, increasing from 
a few highly specialized mathematical 
uses to many people using it for quite dif- 
ferent applications, including those in 
business. Its unique character set, fre- 
quent emphasis on cryptic “one-liner” 
programs, and its effective initial im- 
plementation as an interactive system 
make it important. In addition, the un- 
iqueness of its overall approach and 
philosophy makes it signficant. 


This quotation properly notes the success of 
APL in commercial areas, and also gives ap- 
propriate credit to the effectiveness of the ini- 
tial implementation. One has to have lived 
through the trauma of early time-sharing sys- 
tems to be able to appreciate how good this 
first APL really was. I could tell dozens of 
stories about how bad most early time-sharing 
systems were, and for each of the bad ones, I 
could tell a dozen stories about the good qual- 
ities of this first APL. 
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The last three papers have in common that 
they use the direct definition form of function 
definition. It 1s a bit early yet to say how im- 
portant this concept will be, but there 1s begin- 
ning to be some evidence to suggest that 1t will 
have applicability in many areas of program- 
ming. At first glance, it might appear that its 
use would be restricted to simple mathematical 
functions, and might not, perhaps, be employ- 
ed in large-scale programming activities. How- 
ever, I have seen reasonably large report 
generators —involving several dozen func- 
tions—built using this form, and have seen 
other systems in which two or three hundred of 
these functions interact. 

As APL enters its third decade, it promises 
to find a signficantly larger number of users. 
Those who truly wish to master it should know 
more than just the meanings of its primitive 
function symbols. This book is meant to help 
them! 


A note on the origins of “APL” 


I remember quite well the day I first heard the 
name APL. It was the summer of 1966 and I 
was working in the IBM Mohansic Laboratory, 
a small building in Yorktown Heights, NY. 
The project I was working on was IBM’s first 
effort at developing a commercial time-sharing 
system, one which was called TSS. The sys- 
tem was showing signs of becoming incom- 
prehensible as more and more bells and whis- 
tles were added to it. As an experiment in 
documentation, I had hired three summer stu- 
dents and given them the job of transforming 
the “development workbook” type of documen- 
tation we had for certain parts of the system 
into something more formal, namely Iverson 
notation, which the three students had learned 
while taking a course given by Ken Iverson at 
Fox Lane High School in Mount Kisco, NY. 
One of the students was Eric Iverson, Ken’s 
son. 

As I walked by the office the three students 
shared, I could hear sounds of an argument 
going on. I poked my head in the door, and 


Eric asked me, “Isn’t it true that everyone 
knows the notation we're using is called 
APL?” I was sorry to have to disappoint him 
by confessing that I had never heard it called 
that. Where had he got the idea it was well 
known? And who had decided to call it that? 
In fact, why did it have to be called anything? 
Quite a while later I heard how it was named. 
When the implementation effort started in June 
of 1966, the documentation effort started, too. 
I suppose when they had to write about “it,” 
Falkoff and Iverson realized that they would 
have to give “it” a name. There were probably 
many suggestions made at that time, but I have 
heard of only two. A group at SRA in Chicago 
which was developing instructional materials 
using the notation was in favor of the name 
“Mathlab.” This did not catch on. Another 
suggestion was to call it “Iverson’s Better 
Math” and then let people coin the appropriate 
acronym. This was deemed facetious. 


Then one day Adin Falkoff walked into 
Ken’s office and wrote “A Programming Lan- 
guage’ on the board, and underneath it the ac- 
ronym “APL.” Thus it was born. It was just a 
week or so after this that Eric Iverson asked 
me his question, at a time when the name 
hadn’t yet found its way the thirteen miles up 
the Taconic Parkway from IBM Research to 
IBM Mohansic. 


There was a period of time, however, when 
the name was in danger of having to be 
changed. IBM had just gotten over the experi- 
ence of having to withdraw the name NPL 
which it had given to its “New Programming 
Language,” because of a conflict with the use 
of the same initials by Britain’s National 
Physics Laboratory. The conflict involving 
APL arose when a paper appeared in the 1966 
AFIPS Fall Joint Computer Conference Pro- 
ceedings. It was by George Dodd, of General 
Motors Research, and was entitled APL—a 
language for associative data handling in PL/I. 
(PL/I was the name now given to the former 
NPL.) In the review of this paper that ap- 
peared in Computing Reviews 8, for Sep- 
tember-October 1967 (review 12,753), Saul 
Rosen wrote: 


This reviewer has one suggestion that is 
offered quite seriously, though some 
readers might consider it frivolous. There 
already exists at least one language that 
is reasonably well known by its acronym 
APL. I refer to the language developed 
by Iverson for which translators and in- 
terpreters have been written on a number 
of computers. It would be helpful if the 
authors of the present article could make 
some minor change in the name of their 
processor to remove this very global am- 
biguity. 


George Dodd replied in a letter to the editor 
that appeared in CACM 11, for May 1968, p. 
378: 


I would like to offer a rebuttal to the last 
paragraph of the otherwise excellent and 
accurate review of APL—a language for 
associative data handling in PL/I. . . 
In the review it is pointed out that there 
already exists one other language known 
by the acronym APL, that being the lan- 
guage developed by Kenneth Iverson of 
IBM. The reviewer concludes that the 
name of our processor should be changed 
to avoid a conflict of names. 

Before naming the language we con- 
ducted a thorough search of Computing 
Reviews, AFIPS Reviews, and other 
sources, and at that time (spring, 1966) 
ascertained that the APL acronym was 
unique. Unfortunately, Iverson’s lan- 
guage, which is an internal IBM develop- 
ment project and not an announced prod- 
uct, has also come to be known by the 
same name. We feel our public reference 
to APL preceded Iverson’s and that a 
more reasonable request from the re- 
viewer would be that the name of the 
Iverson APL be changed. 


There was a short but fairly intense skirmish 
inside IBM following the George Dodd letter. 
I don’t know all the details, but I believe the 
IBM branch office which handled the General 
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Motors account was supporting George Dodd, 
and the case for IBM’s right to use the initials 
was being made by AI Rose. I don’t know 
what became of George Dodd's processor. The 
issue wasn't resolved until late in 1968, and 
was one of the things preventing the release of 
APL as a product. Rose eventually won the 
day by making the case that Iverson had estab- 
lished his stake in the initials when his book A 
Programming Language was published in 
1962, long before Dodd’s use of the letters in 
1966. The story goes that, at the final meeting 
to decide whether to release APL, the account 
representative said, “The Detroit branch office 
nonconcurs—” at which point the vice presi- 
dent sitting in judgment replied, “That settles 
it! Branch offices don’t nonconcur.” And so 
IBM retained the use of the letters. 


Curiously, in view of the National Physics 
Laboratory’s objection to the programming lan- 
guage named NPL, the Applied Physics Labo- 
ratory of Johns Hopkins University never made 
an issue, as far as I am aware, of IBM’s joint 
use with them of the initials APL. 

There is at least one other claimant to the 
initials. When the IBM Philadelphia Scientific 
Center closed in 1974, many of the APL 
people there moved across the continent to the 
San Francisco area, to work at an IBM lan- 
guage development location in Palo Alto. 
While this was going on, one of those moving 
picked up a copy of the San Francisco Chroni- 
cle which had the headline, “APL LEAVES 
SAN FRANCISCO.” Since he had just pulled 
up stakes in the Philadelphia area, he was star- 
tled to see that the same thing was about to 
happen again in San Francisco. On closer in- 
spection, however, it developed that the story 
concerned the departure of the facilities of the 
steamship company, American President Lines, 
from the docks of San Francisco to the docks 
across the bay in Oakland. 


Eugene E. McDonnell 


September 198] 
Palo Alto 
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Introduction 


Although the question ot equivalences between algo- 
rithms expressed in the same or different languages has 
received some attention in the literature, the more practical 
question of formal identities among statements in a single 
language has received virtually none. The importance of 
such identities im theoretical work is fairly obvious. The 
present paper will be addressed primarily to the practical 
implications for a compiler. 

The formal identities can be incorporated directiv into a 
compiler, or can alternatively be used by a programmer to 
derive a more efficient equivalent of a program specified 
by an analyst. The ideutities cited include (1) dualities 
which permit the inclusion of only one of a dual pair as a 
basic operator, (2) partitioning identiites which permit the 
automatic allocation of limited fast-access storage in oper- 
ations on arrays, (3) permutation identities which ee 
the adoption of a processing sequence suited to the par- 
ticular representation used (e.g., row list or column list oi 
a matrix), (4) general associativity and distributivity identi- 
ties for double operators (determined as a function of the 
properties of the basic operators) which permit el aie cient 
reordering of operations, (5) transposition identiies, and 
(6) the automatic extension of the appropriate identities 
to any ad hoc operations (1.e., subroutines or procedures) 
defined by any user of the compiler. 

The discussion will be based upon a programming lan- 
guage which has been presented in full elsewhere [1], How- 
ever, the relevant aspects of the language al first be 
summarized for reference. 


* Received July, 1963. Presented at a W orking ( + 
Mechanical Language Structures, Prineeton. N.J., 
sponsored by the Association for Computing Mae hinery, the 
Institute for Defense Analyses, and the Business Equipment 
Manufacturers Association. This work was done at Harvard 
versity while the author was a visiting 
through June, 1963. 


infere: ace OT 


fecturer, February 


, August 1063. 


The problems of transliteration and syntax which com- 
monly dominate discussions of language will here be sub- 
ordinated as follows. The symbols employed will permit 
the immediate determination of the class to which each 
belongs; thus literals are denoted by roman type, variables 
are denoted by italies (lowercase, lowercase bold, and 
upperease bold for sealar, vector and matrix, respectively), 
and operators are denoted by distinct (usually nonalpha- 
betic) symbols. The problems of transliteration (1.e., map- 
ping the set of symbols employed onto the smaller set 
provided in a computer) and of mapping positional infor- 
mation (such as subscripts and superscripts) onto a linear 
representation therefore can, and will, be subordinated to 
questions of the structure of an adequate language. 


The Language! 


> 


1. The left arrow “e=” denotes “specification,” and each” 


statement in the language is of the form 


py AT Ny 
LS [4 


an Tanani, and e is some function. 
heatıon of anv unary operator O to a scalar 
x is denoted by Cu, and the application of a 
binary operator O to the arguments x, y is denoted by 
x O y. The set of basic operators and symbols in shown in 
Table 1. The use of the same symbol for a binary and a 
unary operator (e.g, x L y for min(z, y) and Le for 
largest integer not exc e xv) produces no ambiguitv 
and does conserve symbols. 

As shown in Table 1, any relation is treated as an oper- 
ator (denoted by the aal svmbol for ther relation) having 
and one (logic al var lables). Thus, for integers 
and 7, the operator “=” is equivalent to the Kronecker 
delta. 


2. The app 


argument 


the range zero 


=, 


11] 


' The language described here differs from that in [i] in minor 
details designed to further systematize and simplify its structure. 


z 2 Boxe sept for branching statements, which are not relevant to 


3 
a + 
Dresent a 


ISCUSSION. 
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TABLE 1. SYMBOLS FOR Basic OPERATORS 


UNARY | BINARY 
Operation Symbol | Operation Symbols 
Absolute value | | Arithmetic + — X= 
| operators 
Minus — | Arithmetic re- < < = 2 > # 
| lations 
Floor (largest integer L | Max, Min Poe 
contained) | 
. Ceiling (smallest in- r Exponentia- Y 
teger containing) tion (y*) 
Logical negation ~ Residue min 
mod m 
Reciprocation + | Logical AND, NEON 
(=x 1l + 2) OR 


i 


TABLE 2. UNARY OPERATIONS DEFINED ON ARRAYS 


yx Dimension of vector x 

vA Row dimension of matrix A (dimension of row 
vectors) 

uA Column dimension of matrix Æ (dimension of 


column vectors) 


50980 Transposition of matrix about axis indicated 
by the straight line (SA is ordinary transposi- 
tion of 4) 
D Dx denotes transposition of vector x (reversal 
of order of components) 
a Base-two value of vector 


3. The ith component of a vector x is denoted by x;, 
the ith row vector of a matrix M by M’, the jth column 
vector by M,, and the (1, 7)th element by M,;‘. A vector 
may be represented by a list of its components separated 
by commas. Thus, the statement 


x—1,2,3,4 


specifies x as a vector of dimension 4 comprising the first 
four positive integers. In particular, catenation of two 
vectors x and y may be denoted by x, y. 

4. Operators are extended component-by-component to 
arrays. Thus if © is any operator (unary or binary as 
appropriate) ,° 


r— Oxer — Ox; 
PER PIS BO Ns 
R— OM >R <— OM; 
R—MON ©R;—M; OoN’. 


5. The order of execution of operations is determined by 
parentheses in the usual way and, except for intervening 
parentheses, operations are executed in order from right to 
left, with no priorities accorded to multiplication or other 
- operators. 

6. Certain unary operators are defined upon vectors 
and matrices rather than upon scalars. These appear in 


3 The symbol + will be used to denote equivalence. 
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Table 2 and include the dimension operators v and u as 
well as the transposition operators O, ©, O, ©, in which 
the symbols indicate the axis of transposition of a matrix. 

7. It is convenient to provide symbols for certain con- 
stant vectors and matrices as shown in Table 3. The 
parenthetic expression indicating the dimension of each 
may be elided when it 1s otherwise determined by conform- 
ability with some known vector. 


TABLE 3. Constant VECTORS AND SQUARE MATRICES OF 
DIMENSION n 


Symbol Designated Constant 


e(n) Full vector (all 1’s) 

e’(n) jth unit vector (1 in position J) 

a(n) Prefix vector of weight j (J 
leading 1’s) 

wi (n) Suffix vector of weight jJ (J | 


Logical Vectors 


trailing 1’s) | 

1(n) Interval vector (J, ¿+1,-:+*-, 
grr), 

O(n) Zero matrix 

N (n) Identity matrix (1’s on di- 
agonal) 

a(n) Strict upper right triangle (1’s 


above diagonal) 

m(n) Upper right triangle (1’s above 
and on diagonal) 

Ol (n) Strict lower right triangle 


Logical Matrices 


2! (n) Upper left triangle 


8. If a(i) denotes one of a family of variables (e.g, 
scalars x’ or x;, vectors x‘ or X' or X;, or matrices 'X) 
for 2 belonging to some index set i, and if O is a binary oper- 
ator, then for any sets C i, 


O, /a(i) >: ats iO ta OE 


If 
a(t) = x; and s = v(vx), 
or if 
ali) = X; and s= OX), 
then s and 2 may be elided. Thus, 
+/x = xit x +: H a, 
IN E LINA IN 
+/X =X,+ X2+ --- + My, ete. 


If ali) = X'and s = t (uX), then the s and i may be 
elided provided that a second slash be added to distinguish 
this case from the preceding one. Thus, 


O//XX=X oX o. o x, 


9. If a is any argument and © is any binary operator, 
then ©O”/a denotes the nth power of a with respect to O. 


Formally, 
O"/awadad::: O a (ton terms). 


Hence O'/a = a, O™/a is the inverse of a with respect 
to O, and O°/a is the identity element of the operator O 
(if they exist). 

10. If O, and O» are binary operators, then the matrix 
product As!B is a matrix of dimension A X vB defined by: 


(AB); = O1/A'O2B;. 


In particular, A  B denotes the ordinary matrix product. 
Moreover, the pair (61) behaves as a binary operator on 
A and B and hence may be treated as a binary operator. 
For example, applying the notation of part 9, (X) 7/4 
denotes the ordinary inverse of A. 


If the post-multiplier is a vector x (i.e., a matrix of one 
column), the usual conventions of matrix algebra are 
applied: 


(Axx), = Äx = 4+/4 Xx 
Similarly, 


(x x B); =x XB), and xXy=+/xXy. 


11. The outer product of two vectors x and y is denoted 
by x © y and defined as the matrix M of dimension 
vx X vy such that M; E O N & 

12. Deletion from a vector x of those components corre- 
sponding to the zeros of a logical vector u of like dimension 
is called compression and is denoted by u/x. Compression 
is extended to matrices both row-by-row and column-by- 
column as follows: 


Y — u/X > Y' = u/X' 
Y — u//X o Y; = u/X;. 


11. If p is any vector containing only indices of x, then 
Xp is defined as follows: 


Y — Xp O Yi = Xp, TE (vp). 


If p is a permutation vector (containing each of its own 
indices once) and if yp = vx, then xp is a permutation of x. 

Permutation is extended to matrices by row and by 
column as follows: 


Y- X, o Y'= (X), 
Y- X e Y; = (Xp. 


12. Left rotation is a special case of permutation denoted 


by k T x and defined by 


yek Î xyi = Xow eri - 


Right rotation is denoted by k | x and is defined anal- 
ogously. 

A noncyclic left rotation (left shift) denoted by ô is 
defined as follows: 


k | xo (no) Xk T x. 


(The zero attached to the shaft of the arrow suggests that 
zeros are drawn into the “evacuated” positions). Similarly, 


k | xe (~v) Xk | x. 


Rotations are extended to matrices in the usual way, a 
doubled symbol (e.g., {f) denoting rotation of columns. 
For example, 

(k TX) =k PX, 


O 


and (ke) q N] is a matrix with ones on the kth super- 


diagonal. 


13. Any new operator defined (e.g., by some algorithm, 
usually referred to as a subroutine) ıs to be denoted in 
accordance with Definition (2) and is extended to arrays 
exactly as any of the basic operators defined in the lan- 
guage. For example, if x gcd y (or, better, x | y) is used to 
denote the greatest common divisor of integers x and y, 
then x | y, | / x, and X x y are automatically defined. 
Moreover, if n is a vector of integers and F’ represents 
the prime factorization of n; with respect to the vector 
of primes p (that is, n = F % p), then clearly | / n = 
(L//F) % p. Similarly, if x | y denotes the l.c.m. of x 
and y, then | /n = (T//F) É p. 


Array Operations in a Compiler 


The systematic extension of the familiar vector and 
matrix operations to all operators, and the introduction 
of the generalized matrix product, greatly increase the 
utility and frequency of use of array operations in pro- 
grams, and therefore encourages their inclusion in the 
source language of any compiler. Array operations can, of 
course, be added to the repertoire of any source language 
by providing library or ad hoc subroutines for their exe- 
cution However, the general array operations spawn a 
host of useful identities, and these identities cannot be 
mechanically employed by the compiler unless the array 
operations are denoted in such a way that they are easily 
recognizable. 

The following example illustrates this point. Consider 
the vector operation 


Be ae a 


and the equivalent subroutine (expressed in ALGoL and 
using vx as a known integer): 


for 2 = | step 1 until vx do 
x(t) := x(t) + yl) 


1 The e may be elided. 
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It would be difficult to make a compiler recognize all 
legitimate variants of this program (including, for example, 
an arbitrary order of scanning the components), and to 
make it distinguish the quite different and essentially 
sequential program: 

for 7 = 1 step | until vx — 1 do 


e(i+1) := a(t) + YO) 


The foregoing programs could perhaps be analyzed by 
a compiler, but they are merely simple examples of much 
more complex scan procedures which would occur in, say, 
a matrix product subroutine. A somewhat more complex 
case is illustrated by the vector operation z — k 7 x, 
and the equivalent ALGOL program: 


for 2 = 1 step 1 until vx do begin 
ift + k S vx thenj := i + k; 
else j := 2 + k — vx; 
2(9) := 2(1); end 


Finally, there 1s a distinct advantage in incorporating 
array operations by providing a single general scan for 
each type (e.g., vector, matrix, and matrix product) and 
treating the operator (or operators) as a parameter. It 
then matters not whether each operator is effected by a 
one-line subroutine (i.e., a machine instruction) or a multi- 
line subroutine, or whether it is incorporated in the array 
operation as an open or a closed subroutine. If several 
types of representations are permitted for variables (e.g., 
double precision, floating point, chained vectors), then a 
scan routine may have to be provided for each type of 
representation. 


Identities 


The identities fall naturally into five main classes: 
duality, partitioning (selection), permutation, associativity 
and distributivity, and transposition. A few examples of 
each class will be presented together with a brief discussion 
of their uses. 

In discussing identities it will be convenient to employ 
the symbols O, O1, Oz, p, oc, and r to denote operators, 
and to define certain functions and relations on operations 
as follows. The (unary) logical functions «O and yO 
are equal to unity iff O is associative and O is commuta- 
tive, respectively. The relation ©1602 holds iff O, dis- 
tributes over Os, and OjaO» holds iff O, assoczates 
with Os, that is, 


(1014)0% > LO1(YO22). 


This latter is clearly a generalization of associativity, that 
is, O1a01 > aQ,. Finally, the unary operator 6 applied 
to the operator ©, (denoted by ¿01) produces the 
operator Os which is dual to ©, in the sense defined in 
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TABLE 4. OPERATIONS AND RELATIONS DEFINED ON OPERATORS 
Self-associativity aO = 1 iff rO(yOz) + (cOy)Oz 
Commutativity yO = 1 iff Oy yOzx 
Distributivity ©1602 = 1 iff rO1(yO2z) © (101y)0O2(10 12) 
Associativity OiaO2 = 1 iff 2O1(yO 22) > (1014) O22 
Dual wrt 7 ôO is an operator such that 
(O)x > rOrz if O is unary 
z(6O)y e t((rx)O (ry)) if O is binary. 


Table 4 (which summarizes these functions) and in Sub- 
section (a) below. 

All of the identities are based upon the fundamental 
properties of the elementary operators summarized in 
Tables 5-8. Table 5 shows the vector a of binary arith- 
metic operators and below it two logical matrices describ- 
ing its properties of distributivity and associativity. These 
matrices show, for example, that az (that is, X) dis- 
tributes over + and —, that F and L distribute over 
themselves and each other, and that X associates with 
itself and +. The first four rows of the table show the 
self-associativity of a (equal to the diagonal of the outer 
product matrix a aa), the commutativity, and the dual 
operators, wrt + and —, respectively. 

Table 6 shows three alternative ways of denoting the 
16 binary logical functions: as the vector of operators J, 
as the matrix T of characteristic vectors (T; is the 
characteristic vector of operator l;), and as the vector LT 
obtained as the base-two values (expressed in decimal) of 
the columns of T. The symbols employed in Z include 
the familiar symbols Y and A for or and and, V and 
A for their complements (i.e., the Pierce function and 
“the Sheffer stroke), 0 and 1 for the zero and identity 
functions, the six numerical relations <, <, =, 2, >, 


—) 


TABLE 5. PROPERTIES oF THE BINARY ARITHMETIC OPERATORS 


1 0 1 0 1 1 0 0} aa 
1 01 01 10 0} ya 
X: T | da (wrt +) 
+ — to r ! da (wrt —) 
+ - x + fF L | m}? a 
+ 0 0 0 01 10 0) 
= — 0 0 0 0 r ro. o0 
> x 1 1 0 00 01 0 
3 + r r 0 0 0 0 r 0 Eu 
E r 0 0 0 0 1 10 0 
% L 0 0 0 0 1 10 0 
= | 0 0 0 0 000 0 
m 0.0 2 l l 10 0 
+ 1 100 000 0) 
> = 0 0 0 0 0 00 0 
= x 0 0 1 1 0 00 0 | 
2 = 0 0 0 0 0 0 0 0 , 
E r 0 0 0 0 1 00 0% 
2 L 0 0 0 0 0 1 O 0 | 
a 0000 000 o0) 
=m 0 000000 0) 


Tl and r denote left and right distributivity. 


TABLE 6. PROPERTIES OF THE BINARY LOGICAL OPERATORS 


1 10101110100000 1}al 
1 1000011110000 1 i i143y 
1V> eS o=AAXB> a< V 0} 
0000000011111111 
0-00 ST 3:21.00 EI As E Ne 
00110011001100 11i 
0101010101010101) 
DA> a<o xx VV= 5624S All 
012 3 45 67 8 9101112131415} UT 

001111111100000000) 
NAN 11 111111100000000) 
> /)0 001010000000000) 
« 3010101010000000 0; 
< 41 111111100000000 
>o 51 AA EMAIL ee 
5x60001010000101000 
3V O Oe 0 00 O E00 
£ v80001010000000000 
z= 90001010000101000 
= aloo 001010000101000 
>11;0001010000000 00 0 
aizi010101010000000 0; 
<10101010101010101| 
A14000 1010000000000. 
115010 1010101010101) 
0 A ax< oA V ve a ex <= A 131 
012 3 4 5 67 8 9101112131415} LT 
0011110000000000 0 0) 
A v111100000000000 0 
> 2000100000000 0 0 0 0 
«x 30001 000000000000 
< 341 1111000000000000) 

Sa LEE ii 

Z#6000100100100100%0 

3V 7000100010001000 1| g; 

S v 8000100000000000 0 

2- 90001 001001001000 

< a100 001001001001000 

> 110 001000000000000) 

«120001 000000000000 

<13000100010001000 1! 

A140 0010000000000 0 0 

150001000100010001 
* Duality with respect to ~. 


=, and the symbols a, w, e, and @ for the four “unary” 
functions, that is, rey = 2, roy = Y, Lay = č and 
TOY = Y. 

The remaining portion of Table 6 is arranged like Table 
5. Since ((al) A yl) /l = (0, A, 4, V, =, 1), it follows 
that the only nontrivial associative commutative logical 
operators are g = (A, V, 4, =). The properties of this 
particularly useful subset (abstracted from Table 6) are 
summarized in Table 7. A 

Certain functions of the matrices lal and dôl are also of 
interest—for example, the matrix (lal) > (181) shows that 
there are only six operator pairs which are associative 


and not distributive, namely, (+, 4) (4, =), (=, Æ), 
(=, =), (6, #) and (ð, =). 


(a) DUALITIES 

A unary operator 7 is said to be self-inverse if TT > x. 
If p, o and 7 are unary operators, if r is self-inverse, 
and if px > rorx, then ox > rprx, and p and o are said to 
be dual’ with respect to r. The floor and ceiling operators 
L and [ are obviously dual with respect to the minus 
operator. Duality clearly extends to arrays, e.g., 


[xe Lona. 


The duals of unary operators are shown in Table 8 as 
the vector de. 

If p and o are binary operators, if r is a self-inverse 
unary operator, and if 


px > (rx )a(ty), 


then p and o are said to be dual with respect to r. The max 
and min operators ( T and L) are dual with respect to 
minus, and or and and (\/ and /\) are dual with respect 
to negation (~), as are the relations 4 and =. 

Dual operators are displayed in the vectors ôa and 
öl of Tables 5 and 6. Each of the 16 logical operators has 
a dual: 


él, = Lo-r, . 


The duality of binary operators p and o also extends to 
vectors and matrices. Moreover, when they are used in 


reduction, the following identities hold: 
p/X > Ta/TX, 
p/ X 70/TX, 
WIR 


TABLE 7. PROPERTIES OF THE NONTRIVIAL ASSOCIATIVE 
COMMUTATIVE LOGICAL OPERATORS 


AV A =} 8 
AJL ae 6) 
Vil 1 0 Ll gg 
A 0 0 0 0 
= |O 0 0 0 
A |/1 0 0 0 
V 0 1 0 O, gag 
A 0 0 1 1 
TABLE 8. PROPERTIES OF THE UNARY OPERATORS 
I|Lf-#-te 
POL = => | 6c (wrt —) 
— + } c (wrt +) 
| dc 


5 Abbreviated as “dual wrt”. 
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For example , 


L/x = —[/—x, 
and 
N/x = ~\//~x (DeMorgan's Law). 


The basic reduction identity (namely, pix > ro ‘rx) 
leads immediately to the following family of identities 
for the matrix product: 

ALBO Fr). (rB). 
For the logical operators, the family comprises 256 iden- 
tities, of which 144 are nontrival. 

Duality relations can be specified for a compiler by a 
table incorporating l and öl, and can be employed to 
obviate the inclusion of a subroutine for one of the dual 
pair or to transform a source statement to an equivalent 
form more efficient in execution. For example, in a com- 
puter such as the IBM 7090 (which executes an or be- 
tween registers (i.e., logical vectors) much faster than a 
corresponding and, and which quickly performs an or 
over a register (i.e., a test for non-zero) ), the operation 
~(~x) A y is more efficiently executed as the equivalent 
operation x Y ~y, obtained by duality. 


(b) PARTITIONING 

Partitioning identities, which permit a segment of a 
vector result to be expressed in terms of segments of the 
argument vectors, are of obvious utility in the efficient 
allocation of limited capacity high-speed storage. 

If z — xOy, then u/z — (u/x)O(u/y), where u is 
an arbitrary (but conformable) logical vector. This simple 
identity applies for any binary operator O and permits 
any vector operation to be partitioned or segmented at 
will. A similar identity holds for unary operators. 

From the definition of the matrix product it is clear 
that for any binary operators p and o, 


u/AS Bo Avu/B, 
and 
u//AGB o (u//A) 5 B. 
If p is any associative commutative operator (1.e., 
ap = yp = 1), then 


p/x > (p/u/x)p(p/u/x), 


where u is used as an alternative notation for (~u). 
Consequently, 


AS Bo ((u/A)¢ (w//B))o((u/A)s (u//B)). 


Since the distributivity of o and p is not involved, the 
foregoing identity (which is a simple generalization of 
the familiar identity for the product of partitioned mat- 
rices) applies to most of the common arithmetic and 
logical operators. 


The identity for the two-way partitioning effected by 
u and u can obviously be extended to a (uP)-way par- 
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titioning effected by a logical partition matrix P (defined 
by e = +//P) as follows: 


AS Bo pt “P /( P*/A) & ( P*//B). 


This is the form most useful in allocating storage; if fast- 
access storage for 2n components of A and B were avail- 
able, P would normally be chosen such that P* = 
MOTE A: ae 

(e) PERMUTATION 

In this section, p, q and r will denote permutation 
vectors of appropriate dimensions. 

If © is any binary operator, then 


(XOY)p > XpOYp, 


1e., permutation distributes over any binary operator. 
For any unary operator O, 


(Ox)p > Op), 


and permutation therefore commutes with any unary 
operator. Consider, for example, a vector x whose com- 
ponents are arranged in increasing order on some func- 
tion g(x;) [e.g., lexical order so as to permit binary search] 
but is represented by (1.e., stored as) the vector y in 
arbitrary order and the permutation vector p such that 
x = yp. Then the operation z — Ox may be executed as 
w — Oy, where z = wp. 
For any binary operators p and o, 


(AGB)? > AP GB. (1) 

Moreover, if ap = yp = 1, then p/x > p/xp, and con- 
sequently 

A,¢ B > AGB. (2) 


Finally, then 

(AaB)? > ARGBS,. 
This single identity permits considerable freedom in trans- 
forming a matrix product operation to a form best suited 
to the access limitations imposed by the representation 
(1.e., storage allocation) used for A and B (e.g., row-by- 
row and column-by-column lists). 

For the special case q = 1, A = N, p = +,ando= X, 
equation (1) reduces to the well-known method of per- 
muting the columns of a matrix by ordinary premul- 
tiplication by a permutation matrix NP, that is, 


BP o APL B. 


The fact that N? and O NP are inverse permutations 
(ie, (ON?) X NP = N) is obtainable directly from 
equation (2) and the fact that O NP = (SMN), = Np. 

The rotation operators 7, |, TT, || are special cases of 
permutations; consequently, 


JIk 7 AEB (TASK T B). 


Moreover, this identity still holds when the cyclic rota- 
tion operators are replaced by the corresponding non- 
cyclic operators l ; i , fl , and | . In particular, 


¡MB=jl OB =G ÎN) $B, 


and 1f 


then 
At) ENasTRIN=GTNMEaTN,, 


a well-known identity for the superdiagonal matrices 


h Î Nandk Î N. 


(d) AssocIATIVITY AND DISTRIBUTIVITY OF DOUBLE 
OPERATORS 
If ap = yp = oöp = 1, then a(§) = 1; that is, 


As (BEC) > (AGB)5C. 
Moreover, (5)öp = 1; that is 
AG(BpC) > (AGB)p( ASC). 


For example, if C is the connection matrix of a directed 
graph, then B = C \ C is the matrix of connections of 
length two; the operator (\) is associative and distributes 
over \/. Similarly, if D is a distance matrix (D; is the 


distance from point 2 to point 7), then E = D : D is 


the matrix of minimum distance for trips of two legs; 
(5) is associative and distributes over L. 


The associativity of matrix product operators can be 
very helpful in arranging an efficient sequence of cal- 
culations on matrices stored row-by-row or column-by- 
column. For the logical operators, the number of asso- 
ciative double operators is given by the expression 


+/+/(al)/161 


which (according to Table 6) has the value 66. 
(e) TRANSPOSITIONS 


Of the unary transposition operators, € and © are 
special cases of permutation, but © and © are not. 
Table 9 shows the multiplication table for the group 
generated by these four transpositions. The notation 
chosen for the four added operators is clear: © denotes 
the identity, p > 08 = 90, 9 08 (90° axial 
left rotation), and © > O6© (axial right rotation). 
Since @ 2039, it could as well have been denoted by 
o. 

The following illustrate the many transposition iden- 
tities: 


OASB< As OB (3) 
8AsB > (04)+B (4) 


TABLE 9. Group or TRANSPOSITIONS (rotations of the square) 


o0902998}t: 
OO008082866 
000888928 
888088092 
SOB8S808208 


tot 
OQOVSeeDvdo0o 
OO DDOSOS 
AsBA(OANsOB if ap = yp = 1 (5) 
©(A2B) > (OB)E(OA) if yo =1 (6) 


@(A2B) > (OB)2(@A) if ap =p = 0 =1. (7) 


Identities (3)-(5) are special cases of the permutation 
identities and permit freedom in the order of scan, which 
may be important if a backward-chained representation is 
employed for the vectors involved. Identity (6) is the 
generalization of the well-known transposition identity of 
matrix algebra. Identity (7) is obtained directly from (6) 
by the application of (3), (4) and (5). 


Conclusion 


The use of a programming language in which elementary 
operations are extended systematically to arrays provides 
a wealth of useful identities. If the array operations are 
incorporated directly in a compiler for the language, these 
identities can be automatically applied in compilation, 
using a small number of small tables describing the funda- 
mental properties of the elementary operators. Moreover, 
the identities can be extended to any ad hoc operators 
specified by the source program, provided only that the 
fundamental characteristics (associativity, etc.) of the ad 
hoc operators are supplied. 


Exploitation of the identities within the compiler will, 
of course, increase the complexity of the compiler, and one 
would perhaps incorporate only a selected subset of them. 
However, the possibility of later extensions to exploit 
further identities is of some value. Finally, the identities 
are extremely useful to the programmer (as opposed to the 
analyst who specifies the overall procedure and who may 
use the identities in theoretical work), since the tricks 
used by the programmer, as in allocating storage (par- 
titioning) or modifying the sequence of a scan (permu- 
tation), are almost invariably special cases of the more 
general identities outlined here. 
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DISCUSSION 
Gorn: Some almost ancient sources of generalized operators 
are: Whitehead, Universal Algebra, and Grassman, Dir Luslehn- 
ungsichre. Some more modern sources are: Bourbaki Libra, 
Forder, Calculus of Extension, and Bodewig, Matrie Calculus, 
backus: Why tl 


Gorn: The paper presents generalized relationships among 


is comment? 


operators. The cited references are directly concerned with such 
questions. 

Brooker: 
nightmare for typist and compositor and impossible to implement 
with punching and printing equipment currently available? What 
proposals have you got for overeomung this difheultv? 

Iverson: j: 
avoided its treatment first, because a suitable scheme is hi lehlv 
dependent on the particular equipment 
because it is extremely simple. TY, for example. vou 
stamina of ALGOL and Mab users (who tirelessly write PRO- 
CEDURE and WHENEVER), 
names that I have given (for conversational purposes) to each of 
the operators. Anyone who prefers briefer svmbols 
easily design schemes which are brief, simple and mnemonic. 

Gorn: This question of transliteration: Pin not talking about 
this paper in particular. In general it is a problem that is always 
with us. There is a danger that as the transliteration rules become 
more complicated replacement productions; we rapidly fall into a 
recognition problem, a translation problem and possibly an un- 


Why do you insist on using a notation which is a 


Transliteration is, of course. essential. but I have 
ivallable. and second, 
have the 


then vou ean use the distinct 


can ias I have) 


solvable word problem. 


Iverson: Yes, one should distinguish the recognition of identi- 


fiers from the syntax, which is of more concern to the ultimate 
user, 

Brooker: It is not obvious ta me that these two symbols for 
FLOOR and CEILING have a great deal of mnemeonie value. 

Iverson: Yes, but once vou have read it. von can remember it. 

Gorn: But the more redundance vou put in the svmbolism of 
a language, the more equivalence problems vou have., 

Iverson: Not problems. I suggest that these are assets. In the 
extreme we could go back to the Assign and the Sheffer strel 
let's sav, and then we have no problems. 

Ross: I don't remember who asked the original question 
about notation, but T submit that they find themselves a sugur- 
daddy or someone with a few thousand bucks and get themselves 
a display console such as we're getting with programmable ehar- 
acters. You can even publish from it by t 
see why we should let mechanies influence our progress at ll. 

Iverson: 
not like that comment—-a 48-character set is the thing vou know. 
The limitation on the available Gi acter set, I think 
a transient phenomenon than the a 


aking pictures. I don’t 


Someone who ıs interested in standardization would 


JIS are Ol 


lgorithms we want to describe. 


Ross: With our console the 48 characters are available, and 
there is another mode where you can program any bit patterns 
you want in a matrix; we are doing this specifically for this pur- 
pose because we feel eat the notation that goes along Be the 
set of ideas should be usable. 

Bauer: I would say that compared with some other existing 
proposals for matrix extensions such as that of Ershov this is a 
much more closed consistent system. No one can say dde how 
far we will go in using such a language in the near future. 

Iverson: Let me comment that 1t is useful to distinguish two 
reasons for learning a language; one is for description and analysis 
and the other is for automatic execution. I submit that this kind 
of formalism is extremely helpful in analyzing difficult problenis 
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without worrying about whether one wants to execute the resulting 
program. As a matter of fact, I would use this as a preliminary 
before going into some language that is executable. 

Gorn: As 1 have it, the descriptive language you have does 
have direet translation properties into command language. 

Heit: T would like to translate that comment of Ken’s about 
deseription, analysis, and execution in the following way: Pro- 
grammung languages are maclune-dependent—one is appropriate 
for the huiman processor and another for the computer. 

leerson: Well, 1 would disagree with that because I would use 
exactly the same notation for deseribing the computer. In fact, 
I’ve done it for the 7090 or most of the 7090, and other machines 
as well. In fact, vou ean say the instruction set of the machine is 

another torm of language with a slave to execute it. 

Holt: Then, what was the meaning of vour comment? 

Irerson: At this point, for example, this is not a source lan- 
guage in the sense that there Is a mechanism available for translat- 
ng it into some other language. There is no convenient way for 
automatie execution by translation or direct execution. Now I 
suggest that the notation is worthwhile just for analysis even 
though later we have to do a hand translation into some executable 
language. 

Green: Tf [may interrupt for a moment, I think we should limit 
the diseussion on notation to the next five minutes. And we should 
get to other questions. 

There exist problems around here which were coded 
first in essentially this language and then were translated with 
great care into FORTRAN, for example. 

Perlis: How should this language be used on computers? 
For what elass of problems-—on or off computers? Thus, it’s not 
quite clear to me that a mathematical proof of an algorithm 
written in FORTRAN ithe same algorithm if you will) is any more 
difficult than a mathematical proof of one of your algorithms. 
Aigorithmis ure written for two reasons: 1) execution by computer: 
Which means that if is pointless to write it if you cannot execute 
it ono n computer, and 2; for description and analysis. Now if the 
description is difieult to read, then it fails somewhat. If, 
addition, analysis is as d Mie say, as in ALGOL, then the virtue 
of the language is questionable. 

A last question: You haven't discussed at all the way you de- 
the data, It is not clear that you have a notation for describing 
data, though vou have a great wealth of notation for manipulating 
date once it is described. Now ALGoL will obviously be extended to 
include matrix and vector operations in expressions. So my ques- 
tion is: for what classes of problems, remembering you have no 
eription better than “Algolic”” de- 


ri 7 - 
Tompkins: 


data deseription, is vour des 


verson: Tre uot sure if I can really separate all these points. 
ntation (data description) is too lengthy 
to treat here. To save time, let me say that I discuss it in Chapter 
3 of uv book. This diseussion is fairly limited, but adequate. 
Coneerning the virtues of the language for description and 
analvsis, Í can only say that I have found it very useful in many 
diverse areas, including machine description, search procedures, 


symbolic logic, sorting and linear programming. Now it is a 


The question of represen 


separate problem as to whether vou want to incorporate the 
comipiete generalitv of the language in any particular compiler— 
wat f suggest that itis desirable to have a more general system 
that vou retruet fron: for any particular compiler rather than 
adding ad hoe provisions to more limited languages. 

As to the question of proofs, vou can, of course, translate a 
prooi in any language to any other language, but I suggest that 
re the kind that are immediately obvious to anv 
mathematician, There is, of course, the question of to whom vou 


Te eo SER 
cue Proors igive a 


want vour proofs to be obvious, Likewise, for difheulty of reading, 


the question is, “for whom?” And I suggest that anybody who 
has ever dealt with matrix operations finds this notation verv easy 
to read. 

Perlis: But is it fair to say then that if one is going to create 
or extend a language that the direction of extension really isn’t 
critical—that the accent should not be put on operations so much, 
but on data representation or sequence rules? 

Iverson: No, I disagree. 

Gorn: Since you are supporting an infix notation for binary 
operators, would it not be useful to have some control operators 
in the language which would correspond to the combinatory 
logician’s ‘‘Application’’ operation? Also operators for insertion 
and deletion of parentheses, and operators to adjust priorities 


in the scopes of other operators, e.g. to construct precedence 


matrices of the type discussed by Floyd? 

Iverson: Let me give a sort of general answer to this sort of 
thing. You’re probably talking about some specialized application 
for which you want special operators. I submit that no one can 
design a language that is equally useful for everybody. Instead, 
what you would like to have is a single core which you can extend 
in a straightforward manner. 

In so far as precedence and hierarchy are concerned, I have not 
found any great need for them in my work, but I can understand 
why you might want to use them in compilers. In fact, I think such 
hierarchy should be included in a tabular form so that it is easilv 
changeable. 

Holt: The presentation is a marvelous demonstration of the 
power of notation in the hands of a very clever man. Conclusions: 
(1) Let us teach this skill to clever people. (2) Let us create ma- 
chine mechanisms to respond to notational inventions. 

Iverson: Onthe contrary, the basic notions are very simple and 
should be introduced at high school level to provide a means for 
describing algorithms explicitly. For example, the vector can be 
introduced as a convenient means for naming a family of variables 
and can be used by the student (together with a few very simple 
operators) to work out explicit algorithms for well-known opera- 
tions such as decimal addition, polynomial evaluation, ete. A 
little notation and much care in requiring explicit algorithms 
would, in fact, clarify and simplify the presentation of elementary 
mathematics and obviate the teaching of programming as such. 

Gosden: Many of the equivalences only become useful and 
powerful when time dependency is included. For example, each 
operation on any array implies serial or parallel execution com- 
ponent by component. How can you cover this for serial or parallel 
statements? Obviously, there are many tricks that are time (or 
series) dependent in array operations. How do they relate to 
dualities and equivalences, ete. 


Iverson: Parallel operation is implied by any vector operation; 
serial operation can be made explicit by a program showing the 
specified sequence of operations on components. Distinctions of 
this type (employing the present notation) are made clear in 
Falkoff’s “Algorithms for Parallel Search Memories” [J. ACM, 
Oct. 1962]. 


More complex simultaneity can be expressed by a collection of 
programs operating concurrently, all mutually independent but 
for interaction through certain (interlock) variables common to 
some two or more programs. Explicit dependence on real time can 
be introduced by incorporating, as one of this collection of pro- 
grams, a program describing a clock (i.e., oscillator)-driven 
counter. 

Gorn: Does your generalized operator notation for matrices 
lead to a simpler proof of the generalized Laplace expansion of 
determinants? 

Iverson: For a given logical vector u, the Laplace expansion 
of the determinant 6A can be expressed as 


5A = (+:/((00/S//A) X (80/S'//A) X pS) X p'u, 


where S is a logical matrix whose rows represent all partitions of 
weight +/u, where p'v = p(v/ıl, v/1) is the “parity” of the 
logical vector v, and pp is the parity of the permutation vector p, 
defined as +1 or —1 according as the parity of p is even or odd. 
Since 


5A = +:/((X/NP'/A) X EP) 

(where P is the matrix whose (vA)! rows exhaust all permutations 
of dimension vA, and where compression by a logical matrix U is 
defined in the obvious way as the catenation of the vectors U*/A?), 
then the usual proof of the Laplace expansion (i.e. showing that a 
typical term of either expansion occurs in the other) can be carried 
through directly with the aid of the following fact: if u is any 
logical vector and p is a permutation of like dimension, then 
there exists a unique triple v, q, r, such that 


Ni = u/v//N?, and N’! = u/v//N?. 
[The vectors v and u are clearly related by the expressions v = 


Sa, and u = vX SP, and moreover, pp = (p'u) X (p’v) X 


(oq) X (pr)). 


The special matrices occurring in the foregoing can all be 
specified formally in terms of the matrix T (b, n) defined as follows: 
Ti € V(b), T = b’, vT = n, and bLT = Y, where bLx 
denotes the base-b value of the vector x. Thus S= 
(+ /u = +/M)//M, where M = T(2, vA) and P = (A /o/M)//M, 
where M = TOA, vA), and o/x is the set selection operation 
1, p. 23]. 


Moreover, the parity function pp may be defined formally as 
pp = U— u, where u = 2| +/01/(p > p). 


Dijkstra: How would you represent a more complex operation, 
for example, the sum of all elements of a matrix M which are 
equal to the sum of the corresponding row and column indices? 


Iverson: ++/(M = U 44) //M 
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The common conventions for the evaluation of unparenthesized ex- 
pressions include the rules that (1) in a multilevel expression such as 
a+b 
c=d” 
is evaluated; (2) subject to the first rule, multiplication and division 
are performed before addition and subtraction; (3) subject to the first 
two rules, evaluation proceeds from left to right; (4) division can be 


each line is evaluated before the function connecting the lines 


represented by three distinct but synonymous symbols |a +b, a/b, 
and | and (5) multiplication can be represented by two distinct but 
synonymous symbols (a x b and a:b), or the symbol can be elided. 
The one convention used in this book is that (subject to parentheses) 
evaluation proceeds from right to left. This appendix treats the major 
reasons for this choice. 


The common conventions are usually defended on the grounds 
that they are simple and well known and that their use significantly 
simplifies the reading and writing of expressions. Because of the 
familiarity of certain common constructions, these conventions appear 
simple, but this simplicity is illusory and vanishes on closer examina- 
tion. Inquiries among students and colleagues have shown such dis- 
agreement on the interpretation of the conventions as to dispel the 
notion that they are well known. Finally, the much simpler conven- 
tion adopted in this text proves at least as effective in simplifying the 
reading and writing of expressions. 


Consider, for example, the expressions x + y x z and x + yz. AC- 
cording to the rules, both are equivalent to the expression (x + y) xz. 
However, yz is frequently used as an expression for multiplication 
which is performed first regardless of other rules. Furthermore, the 
dot notation for multiplication yields the expression x + y + z, which 
(according to the interpretations encountered) seems to fall midway 
between the other cases. Proponents of the common convention pro- 
test that such expressions would be parenthesized anyway for clarity; 
but then the convention seems to lose most of its value. 


Matters are further complicated by the alternative notations for 
division. For example, x + y~+z and x= y/z should have the same 
interpretation, but frequently they do not. Similarly, the formally 
equivalent expressions x+a+y+b and x+a/y+b frequently re- 
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ceive different interpretations. It is interesting to consider the dif- 
ferent possible evaluations of the following expressions which, 
according to rule 3, are equivalent: 


X>VXZ x=v"Z x= yz 
3x2 x/v°z xl yz 


The common convention also appears to include a number of 
tacit rules that writers obey automatically. For example, xy may be 
written for x x y, and any variable should be replaceable by a numeri- 
cal value. However, while the expression 3y is commonplace, most 
readers would find the expressions x3 and 34 jarring and perhaps 
inadmissible as expressions for x x 3 and 3 x 4. 


In spite of these defects, the common conventions are reasonably 
convenient when applied to simple expressions involving only the 
four basic arithmetic functions, but more serious difficulties arise in 
their haphazard extension to other functions. For example, the expres- 
sion sin nx cos m would be interpreted as (sin n) x (cos m), whereas 
sin nx m would be interpreted as sin (n x m). Moreover, the expres- 


sion „e® . : (vie) yc, a . 
a?" is usually interpreted as « rather than as ((a?))” (that is, 
from right to left rather than from left to right according to rule 3), 
apparently because the latter case can be expressed by the equivalent 
expression a?*“xd. In the notation used in this book the first case 
would be expressed as either a% b* cd or */a,b,c,d and the 
second as either a* bxcxdora* Xi b,c,d. 


As further functions are introduced (for example, absolute value, 
maximum, minimum, residue, the relations, logical functions, and the 
circular functions), the complexity grows and the utility of any relative 
priority of execution among the functions decreases. Mathematical 
texts handle this problem either by liberal use of parentheses or by 
ad hoc (and frequently unstated) conventions. Programming lan- 
guages, which must face the issue more formally, have usually treated 
the problem by establishing a hierarchy of priorities among the func- 
tions such that any function is evaluated before all others having lower 
priorities. Such a system is usually very complex (Algol, one of the 
best known, has nine priority levels) and can therefore be used effi- 
ciently only by a programmer who employs it frequently. The occa- 
sional (and the prudent) programmer avoids the whole issue by 
including all the parentheses that would have been required with no 
convention. 


Further examples of the complexity and ambiguity of the com- 
mon conventions could be easily adduced. However, the skeptical 
reader will find it more instructive to scan various textbooks trying to 
formulate precisely the rules used (stated or implied) and applying 
them rigorously. 


The question of the efficacy of the common convention in re- 
ducing the need for parentheses will now be addressed. Any conven- 
tion will reduce the need for parentheses, but the important question 
is how the common convention compares in this respect with other 
conventions, and in particular with the notation used in this text. 

The utility of the common convention stands forth well in the 
expression for a polynomial. For example, in the expression 


ax? + bxi + cx" 


it would be awkward to have to enclose each term in parentheses. 
However, in the present notation this would be written as 


+/(a,b,c)xx*p,g,r 


or, if the vectors of coefficients and exponents were denoted by c and e 
respectively, then it would be written as 


+/exxx*e 


These forms make clear the structure of the polynomial while per- 
mitting suppression of detail by using vectors: the corresponding ex- 
pression in conventional notation is 


CAITEAN O PN 
where n is the magic variable that denotes the dimensions of all vectors. 


The expression (derived in Chapter 4) for the efficient evaluation 
of a polynomial such as (a,b,c,d,e,f) Il x provides a further ex- 
ample. In the notation used in this text it appears (without parentheses) 
as 


(a,b,c,d,e,f)lIx=a+xxb+xxc+xxd+txxe+txıxx[f 


whereas in the common convention it would appear as 


(a,b,c,d,e,f)Ilx 
=a+xx(b+xx(ce+xx(d+xxle+xxf)))) 


Further examples could be adduced, but again the skeptical 
reader will find it more instructive to formulate a set of precise rules 
based on the common convention and to translate into the resulting 
notation the expressions appearing in the present text. 


There is one further argument against imposing a priority among 
functions in the present notation. If F and G are dyadic functions, 
then the expression F/ x G y would have either of two interpretations 
(that is, (F/ x) G y or F| (x G y)), depending upon the relative priori- 
ties of F and G. These two interpretations differ markedly in form 
and would therefore lead to confusion. For example, +/ x x y would be 
interpreted as +/(xx y) whereas the similar expression x/x+ y 
would be interpreted as (x/ x) + y. Similar remarks apply to the matrix 
product M F . G N (defined in Chapter 9). 


The reasons for choosing a right-to-left instead of a left-to-right 
convention are: 


1. The usual mathematical convention of placing a monadic 
function to the left of its argument leads to a right-to- 
left execution for monadic functions; for example, F G x 
=F (Gx). 

2. The notation F/ z for reduction (by any dyadic function F) 
tends to require fewer parentheses with a right-to-left con- 
vention. For example, expressions such as +/ (xx y) or 
+/ (u/x) tend to occur more frequently than (+/ x) x y and 
(+/u)/ x. 

3. An expression evaluated from right to left is the easiest to 
read from left to right. For example, the expression 
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(for the efficient evaluation of a polynomial) is read as a plus 

the entire expression following, or as a plus x times the fol- 

lowing expression, or as a plus x times b plus the following 

expression, and so on. 

In the definition 
F[lx=x,Fx,Fx,F...F x, 


x 


the right-to-left convention leads to a more useful definition 
for nonassociative functions F than does the left-to-right 
convention. For example, —/ x denotes the alternating sum 
of the components of x, whereas in a left-to-right convention 
it would denote the first component minus the sum of the 
remaining components. Thus if d is the vector of decimal 
digits representing the number n, then the value of the ex- 
pression 0 = 9+/d determines the divisibility of n by 9; 
in the right-to-left convention, the similar expression 
0 = 11 —/d determines divisibility by 11. 
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A.l INTRODUCTION 


Although few matnematicians would quarrel with the 
proposition that the algebraic notation taught in high 
school is a language (and indeed tne primary language of 
mathematics), yet little attention has been paid to the 
possible implications of such a view of algebra. This paper 
adopts this point of view to illuminate tne inconsistencies 
and deficiencies of conventional notation and to explore the 
implications of analogies between the teacning of natural 
languages and the teaching of algebra. Based on this 
analysis it presents a simple and consistent algebraic 
notation, illustrates its power in tne exposition of some 
familiar topics in algebra, and proposes a pasis for an 
introductory course in algepra. Moreover, it shows how a 
computer can, if desired, be used in tne teaching process, 
Since tne language proposed is directly usable on a computer 
terminal. 


A.2 ARITHMETIC NOTATION 


We will first discuss tne notation of arithmetic, 
i.e., that part of algebraic notation which does not involve 
the use of variables. For example, the expressions 3-4 and 
(3+4)-(5+6) are arithmetic expressions, but the expressions 
3-X and (X+4)-(Y+6) are not. We will now explore the 
anomalies of arithmetic notation and the modifications 
needed to remove them. 


Functions_and_ symbols for functions. The importance of 
introducing the concept of "function" ratner early in the 
matnematical curriculum is now widely recognized. 
Nevertheless, tnose functions which tne student encounters 
first are usually referred to not as "functions" but as 
"operators". For example, absolute value (|-3|) and 
arithmetic negation (-3) are usually referred to as 
operators. In fact, most of the functions which are so 
fundamental and so widely used that they nave been assigned 
some graphic symbol are commonly called operators 
(particularly tnose functions such as plus and times which 
apply to two arguments), wnereas the less common functions 
which are usually referred to py writing out their names 
(e.g., Sin, Cos, Factorial) are called functions. 

This practice of referring to tne most common and most 
elementary functions as operators is surely an unnecessary 
obstacle to the understanding of functions when that term is 
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first applied to the more complex functions encountered. 
For this reason the term "function" will pe used nere for 
all functions regardless of the choice of symbols used to 
represent them. 


The functions of elementary algebra are of two types, 


taking either one argument or two. Thus addition is a 
function of two arguments (denoted vy X+Y) and negation is a 
function of one argument (denoted by -Y). It would seem 


poth easy and reasonable to adopt one form for each type of 
function as suggested by the foregoing examples, that is, 
tie symbol for a function of two arguments occurs between 
its arguments, and the symbol for a function of one argument 
occurs before its argument. Conventional notation displays 
considerable anarchy on this point: 


l. Certain functions are denoted by any one of 
several symbols which are supposed to be synonomous 
but which are, nowever, used in subtly different ways. 
For example, in conventional algebra XxY and XY both 
denote the product of X and Y. However, one would 
write either 3xY or 3X or Xx3, or 3x4, but would not 
likely accept X3 as an expression for Xx3, nor 3 4 as 
an expression for 3x4. Similarly, xX+Y and X/Y are 
supposed to pe synonomous, but in the sentence "Reduce 
8/6 to lowest terms", the symbol / does not stand for 
division. 

2. The power function nas no symbol, and is denoted 
by position only, as in XV. Tne same notation is 
often used to denote the Nth element of a family or 
array Ä. 


3. The remainder function (that is, the integer 
remainder on dividing X into Y) is used very early in 
arithmetic (e.g., in factoring) but is commonly not 
recognized as a function on a par with addition, 
division, etc., nor assigned a symbol. Because the 
remainder function nas no symbol and is commonly 
evaluated by the metnod of long division, there isa 
tendency to confuse it with division. This confusion 
is compounded by the fact that the term "quotient" 
itself is ambiguous, sometimes meaning the quotient 
and sometimes the integer part of the quotient. 


4. The symbol for a function of one argument 
sometimes occurs before the argument (as in -4) but 
may also occur after it (as in 4! for factorial 4) or 
on both sides (as in |X| for absolute value of X). 


Table A.l snows a set of symbols which can be used in 
a simple consistent manner to denote tne functions mentioned 
thus far, as well as a few other very useful basic functions 
such as maximum, minimum, integer part, reciprocal, and 


exponential. The table shows two uses for each symbol, one 
to denote a monadic function (i.e., a function of one 
argument), and one to denote a dyadic function (i.e., a 
function of two arguments). This is simply a systematic 


exploitation of the example set by the familiar use of the 
minus sign, either as a dyadic function (i.e., subtraction 
as in 4-3) or as a monadic function (i.e., negation as in 
-3). No function symbol is permitted to be elided; for 
example, XxY may not be written as XY. 


A little experimentation with the notation of Table 
A.l will show that it can be used to express clearly a 
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Monadic form fB E Dyadic form AfB 


Definition Name Name Definition 
or example or example 


+3 <> 0+3 Plus Plus 2+3.2 +> 5.2 
-3 ++ 0-3 Negative Minus 2-3.2 +> 1.2 
x3 <> (3>0)-(3<0) Signum Times 2x3.2 +> 6.4 
+3 +> 1:3 Reciprocal|+| Divide |2:3.2 +>0.625 
B TB |LB Ceiling Maximum |3[7 +> 7 
3.14] 4 3 
23.141 3 ly Floor Minimum |3L7 +> 3 
*3 «> (2,7182800)*3| Expon- Power 2x3 <> 8 
ential 
@x5 <> 5 «> *@5 Watural Loga- 1083+>Log 3 base 10 
logaritnm rithm 1083+>(8®83):®10 
[73,14 +> 3.14 Magnitude Remain-|3|8 <+> 2 
der 
Table A.l 


number of matters which are awkward or impossible to express 
in conventional notation. For example, X:Y is the quotient 
of X divided by Y; either L(X:Y) or ((X-(Y|X))+Y yield the 
integer part of the quotient of Xx divided by Y; and X[(-X) 
is equivalent to |X. 


In conventional notation tne symbols <, <, =, 2, > 
and æ are used to state relations among quantities; for 
example, tne expression 3<4 asserts that 3 is less than 4. 
It is more useful to employ them as symbols for dyadic 
functions defined to yield the value 1 if the indicated 
relation actually nolds, and the value zero if it does not. 


Thus 3<4 yields the value 1, and 5+(3<s4) yields the value 6. 


Arrays. The ability to refer to collections or arrays of 
items is an important element in any natural language and is 
equally important in mathematics. The notation of vector 
algepra embodies the use of arrays (vectors, matrices, 
3-dimensional arrays, etc.) but in a manner which is 
difficult to learn and limited primarily to the treatment of 
linear functions. Arrays are not normally included in 
elementary algebra, probably because they are thought to be 
difficult to learn and not relevant to elementary topics. 


A vector (tnat is, a l-dimensional array) can be 
represented py a list of its elements (e.g., 1 3 5 7) and 
all functions can pe assumed to be applied 


element-by-element. For example: 


1 2 3 4 x 4 3 2 1 produces 


Similarly: 


f 2 8: he ae A ge 2 of 
Ss A o p 

: 1 2 3 4 
1 2 6 24 
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1 2 3 4 x 2 
1 4 9 16 

2 x 1 2 3 4 

8 16 


In addition to applying a function to each element of 
an array, it is also necessary to be able to apply some 
specified function to the collection itself. For example, 
"Take the sum of all elements", or "Take the product of all 
elements", or "Take the maximum of all elements". This can 
be denoted as follows: 


2: DS 8.2 
ie 

Ripa NES 2 
60 

WE 
5 

The rules for using such vectors are simple and 
obvious from the foregoing examples. Vectors are relevant 
to elementary mathematics in a variety of ways. For 
example: 

l. They can be used (as in the foregoing examples) to 
display the patterns produced by various functions when 
applied to certain patterns of arguments. 

2. They can be used to represent points in coordinate 
geometry. Thus 5 7 19 and 2 3 7 represent two points, 
5 7 19 - 2 3 7 yields 3 4 12, the displacement between 
them, and (+/(5 7 19 - 2 3 7)*2)*.5 yields 13, the 
distance between them. 

3. They can be used to represent rational numbers. Thus if 


3 4 represents the fraction three-fourths, then 3 4x5 6 
yields 15 24, the product of the fractions represented 
py 3 4 and 5 6. Moreover, +/3 4 and +/5 6 and #/15 24 
yield the actual numbers represented. 


4. A polynomial can be represented by its vector of 
coefficients and vector of exponents. For example, the 
polynomial with coefficients 3 1 2 4 and exponents 
012 3 can be evaluated for the argument 5 by the 
following expression: 


+O: 2,87% 5.89% 1278 


558 
Constants. Conventional notation provides means for writing 
any positive constant (e.g., 17 or 3.14) but there is no 


distinct notation for negative constants, since the symbol - 
occurring in a number like -35 is indistinguishable from the 
symbol for the negation function. Thus negative thirty-five 
is written as an expression, which is much as if we 
neglected to have symbols for five and zero because 
expressions for them could be written in a variety of ways 


such as 8-3 and 8-8. 


It seems advisable to follow Beberman [1] in using a 
raised minus sign to denote negative numbers. For example: 


oe, ze 
e “LE © dd. 2 


Conventional notation also provides no convenient way 
to represent numbers which are easily expressed in 
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expressions of the form 2.14x10 or 3.265x10 . A useful 
practice widely used in computer languages is to replace the 
symbols x10 by the symbol E (for exponent) as 


follows: 2.14F8 and 3.265E 9. 


Order of execution. The order of execution in an algebraic 
expression is commonly specified by parentheses. The rules 
for parentheses are very simple, but the rules which apply 
in the absence of parentheses are complex and chaotic. Tney 
are based primarily on a hierarchy of functions (e.g., the 
power function is executed before multiplication, which is 
executed before addition) which has apparently arisen 
because of its convenience in writing polynomials. 


Viewed aS amatter of language, the only purpose of 


such rules is the potential economy in the use of 
parentheses and the consequent gain in readability of 
complex expressions. Economy and simplicity can be achieved 


by the following rule: parentheses are obeyed as usual and 
otherwise expressions are evaluated from right to left with 
all functions being treated equally. The advantages of this 
rule and the complexity and ambiguity of conventional rules 
are discussed in Berry [2], page 27 and in Iverson [3], 
Appendix A. Even polynomials can be conveniently written 
without parentheses if use is made of vectors. For example, 
the polynomial in X with coefficients 3 1 2 4 can be written 
without parentheses as +/3 124 x X * 01 2 3, Moreover, 
Horner's expression for the efficient evaluation of this 
same polynomial can also be written without parentheses as 
follows: 


3+Xx1+Xx2+XxH 


Analogies_with_natural_language. The arithmetic expression 
3x4 can be viewed as an order to do something, that is, 
multiply the arguments 3 and 4. Similarly, a more complex 
expression can be viewed as an order to perform a number of 
operations in a specified order. In this sense, an 
arithmetic expression is an imperative sentence, and a 
function corresponds to an imperative verb in natural 
language. Indeed, the word "function" derives from the 
latin verb "fungi" meaning "to perform". 


This view of a function does not conflict with the 
usual mathematical definition as a specified correspondence 
between the elements of domain and range, but rather 
Supplements this static view with a dynamic view of a 


any specified element of the domain. 


If functions correspond to imperative verbs, then 
their arguments (tne things upon which they act) correspond 
to nouns. In fact, the word "argument" has (or at least 
had) the meaning topic, theme, or subject. Moreover, the 
positive integers, being the most concrete of arithmetical 
objects, may be said to correspond to proper nouns. 


What are the roles of negative numbers, rational 
numbers, irrational numbers, and complex numbers? The 
subtraction function, introduced as an inverse to addition, 
yields positive integers in some cases but not in others, 
and negative numbers are introduced to refer to the results 


in these cases. In other words, a negative number refers to 
a process or the result of a process, and is therefore 
analogous to an abstract noun. For example, the abstract 


noun "justice" refers not to some concrete object (examples 
of which one may point to) put to a process or result of a 
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process. Similarly, rational and complex numbers refer to 
the results of processes; division, and finding the zeros 
of polynomials, respectively. 


A.3 ALGEBRAIC NOTATION 


Names. An expression such as 3xX can be evaluated only if 
the variable X has been assigned an actual value. In one 
sense, therefore, a variable corresponds to a pronoun whose 


referent must be made clear before any sentence including it 


can be fully understood. In English the referent may be 
made clear by an explicit statement, but is more often made 
clear by indirection (e.g., "See the door. Close it."), or 


by context. 


In conventional algebra, the value assigned to a 
variable name is usually made clear informally by some 
Statement such as "Let X nave tne value 6" or "Let X=6", 
Since the equal symbol (tnat is, '=') is also used in other 
Ways, it is better to avoid its use for this purpose and to 
use a distinct symbol as follows: 


X+6 
Y+3x4 
X+Y 
18 
(X-3)x(X-5) 
9 


Assigning_names_to_expressions. In tne foregoing example, 
the expression (X-3)x(X-5) was written as an instruction to 
evaluate the expression for a particular value already 
assigned to X. One also writes the same expression for the 
quite different notion "Consider the expression (X-3)x(X-5) 
for any value which might later be assigned to the argument 
X." This is a distinct notion which should be represented 
by distinct notation. The idea is to be able to refer to 
the expression and this can be done by assigning a name to 
it. The following notation serves: 


VG. Pe, A 
E SS a!) 


The V's indicate that the symbols between them define 
a function; the first line snow that the name of the 
function is G. The names X and Z are dummy names standing 
for the argument and result, and the second line shows how 
they are related. 


Following this definition, the name G may be used as a 
function. For example: 


G 6 
3 

G11234567 
8 30 10 3 8 


Iterative functions can be defined with equal ease as 
shown in Chapter 12. 
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Ferm of names. If the variables occurring in algebraic 
sentences are viewed simply as names, it seems reasonable to 
employ names with some mnemonic significance as illustrated 
by the following sequence: 


LENGTH+6 

WIDTH<5 
AREA+LENGTHxWIDTH 
HEIGHT<4 
VOLUME+AREAxHEIGHT 


This is not done in conventional notation, apparently 
because it is ruled out by the convention that the 
multiplication sign may be elided; that is, AREA cannot be 
used as a name because it would be interpreted as AxRxExA, 


This same convention leads to otner anomalies as well, 
some of which were discussed in the section on arithmetic 
notation. The proposal made there (i.e., that the 
multiplication sign cannot be elided) will permit variable 
names of any length. 


A.4 ANALOGIES WITH THE TEACHING OF NATURAL LANGUAGE 


If one views the teaching of algebra as the teaching 
of a language, it appears remarkable how little attention is 
given to the reading and writing of algebraic sentences, and 
now much attention is given to identities, that is, to the 
analysis of sentences with a view to determining other 
equivalent sentences; e.g., "Simplify the expression 
(X-4) x (X+4) ." It is possible that this emphasis accounts 
for much of the difficulty in teaching algebra, and that the 
teaching and learning processes in natural languages may 
Suggest a more effective approach. 


In the learning of a native language one can 
distinguish the following major phases: 


l. An informal phase, in which the child learns to 
communicate in a combination of gestures, single words, 
etc., but with no attempt to form grammatical sentences. 


2. A formal phase, in which the child learns to communicate 
in formal sentences. This phase is essential because it 
is difficult or impossible to communicate complex 
matters with precision without imposing some formal 
structure on the language. 


3. An analytic phase, in which one learns to analyze 
sentences with a view to determining equivalent (and 
perhaps "Simpler" or "more effective") sentences. The 
extreme case of such analysis is Aristotelian Logic, 
which attempts a formal analysis of certain classes of 
sentences. More practical everyday caseS occur every 
time one carefully reads a composition and suggests 
alternative sentences which convey the same meaning in a 
briefer or simpler form. 


The same phases can be distinguished in the teaching 
of algebraic notation: 
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l. An informal phase in which one issues an instruction to 


add 2 and 3 in any way which will be understood. For 
example: 
2+3 Add 2 and 3 
2 2 
3 +8 


— — 


Add two and three 
Add // and /// 


The form of the expression is unimportant, provided that 
the instruction is understood. 


2. A formal phase in which one emphasizes proper sentence 
structure and would not accept expressions such 
2 
as 6 x _3 or 6x(add two and three) in lieu of 6x(2+3). 
Again, adherence to certain structural rules is 
necessary to permit the precise communication of complex 
matters, 


3. An analytic phase in which one learns to analyze 
sentences with a view to establishing certain relations 
(usually identity) among them. Thus one learns not only 
that 3+4 is equal to 4+3 but that the sentences X+Y and 
Y+X are equivalent, that is, yield the same result 
whatever the meanings assigned to the pronouns X and Y. 


In learning a native language, a child spends many 
years in the informal and formal phases (both in and out of 
school) before facing the analytic phase. By this time she 
has easy familiarity with the purposes of a language and the 
meanings of sentences which might be analyzed and 
transformed. The situation is quite different in most 
conventional courses in algebra - very little time is spent 
in the formal phase (reading, writing and "understanding" 
formal algebraic sentences) before attacking identities 
(such as commutativity, associativity, distributivity, 
EECa Indeed, students often do not realize that they 
might quickly check their work in "simplification" by 
substituting certain values for the variables occurring in 
the original and derived expressions and comparing the 
evaluated results to see if the expressions have the same 
"meaning", at least for the chosen values of the variables. 


It is interesting to speculate on what would happen if 
a native language were taught in an analogous way, that is, 
if children were forced to analyze sentences at a stage in 
their development when their grasp of the purpose and 
meaning of sentences were as shaky as the algebra student's 
grasp of the purpose and meaning of algebraic sentences. 
Perhaps they would fail to learn to converse, just as many 
students fail to learn the much simpler task of reading. 


Another interesting aspect of learning the 
non-analytic aspects of a native language is that much (if 
not most) of the motivation comes not from an interest in 
language, but from the intrinsic interest of the material 
(in children's stories, everyday dialogue, etc.) for which 


it is used. It is doubtful that the same is true in 
algebra - ruling out statements of an analytic nature 
(identities, etc.), how many "interesting" algebraic 


sentences does a student encounter? 


KENNETH E. IVERSON 


The use of arrays can open up the possibility of much 
more interesting algebraic sentences. This can apply both 
to sentences to be read (that is, evaluated) and written by 
students. For example, the statements: 


2x1 2 3 4 5 
2x1 2 3 4 5 
Led 2,8. 3 
1 2 3 4 572 
1 2 3 4 5*2 
1 2 3 4 5x5 4 3 2 1 
produce interesting patterns and therefore have more 


intrinsic interest than similar expressions involving only 
single quantities. For example, the last expression can be 
construed as yielding a set of possible areas for a 
rectangle having a fixed perimeter of 12. 


More interesting possibilities are opened up by 
certain simple extensions of the use of arrays. One example 
of such extensions will be treated here. This extension 
allows one to apply any dyadic function to two vectors A and 
B so as to obtain not simply the element-by-element product 
produced by the expression AxB, but a table of all products 
produced by pairing each element of A with each element of 
B. For example: 


A+1 2 3 

B+ 2.3: 5 7 

Ao,xB Ao.+B Ao.xB 
2 3 5 7 3 4 6 8 1 1 1 1 
4 6 10 14 4 5 7 9 4 8 32 128 
6 9 15 21 5 6 8 10 9 27. 243 2187 


If S5+1 23 4 5 67, then the following expressions 
yield an addition table, a multiplication table, a 
subtraction table, a maximum table, an "equal" table, and a 
"greater than or equal" table: 


So,+s als 
2 3 4 5 6 7 8 Pe e AE E ds Y 
3 4 5 6 7 8 9 2: DB: ols e 6 a 
4 5 6 7 8 9 10 3 3 3 4 5 6 7 
5 6 7 8 % 30 + 4 4 4 4 5 6 7 
6 7 8 9. To. 44 12 5 5 5 5 5 6 7 
7 8 9 10 11 12 13 6 6 6 6 6 6 7 
8 g O 24. 17 48 14 E TA AE O E 
SEK oe 
1 2 3 y 5 6 7 1: 0:70 00-10: © 
2 y 6 8 10 12 14 O A -0 0 0 0-0 
3 6 G- 22 45> 18 4 9. 0:20. 20:0: -G 
y 8 12 16 20 24 28 O° 0: 01. 000.0 
5 10 15 20 25 30 35 0..:0: OO 4d 2030 
6 12 18 24 30 36 42 Or 09 0 0).c0:- 30 
7 14 21 28 35 42 49 0-0 0-0 00 
So. -5S Doras 
O 1 2 3 4 5 ` ing 0000 (06 
1 O De 83 4 75 $ 1 OO. 0. 0.6 
2 2 Oa en La OOOO 
oS E O A “8 4.2 O: 20070 
he a = a a dr ie de E MO AE O 
Br vibe ee DO A Ae yee fl ES e E: 
6 5 4 3 2 1 0 14. de 544 
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Moreover, the graph of a function can be produced as 
an "equal" table as follows. First recall the function G 


defined earlier: 


VZ+G X 
2+(X-3)x(X-5)V 


3 0 T o. 3 8 


The range of the function for this set of arguments is 


from 8 down to 1, and the elements of this range are all 
contained in the following vector: 


R+8 7 6 543210 1 


Consequently, the "equal" table Re.=G S produces a rough 
graph of the function (represented by 1's) as follows: 


Re .=G 8 
1 00170 ¿00500 Y 
O 000000 
0 000000 
O 000000 
0 9: 02 0 70:00 
Or Ly O 9 20: de p 
DD 00 Go: ¿Gl AOL. 29 
0 DD “Or IO 
Dany de. 005 “da MOL 20 
0001000 


A.5 A PROGRAM FOR ELEMENTARY ALGEBRA 


The foregoing analysis suggests the development of an 
algebra curriculum with the following characteristics: 


1. The notation used is unambiguous, with simple and 
consistent rules of syntax, and with provision for the 
simple and direct use of arrays. Moreover, the 
notation is not taught as a separate matter, but is 
introduced as needed in conjunction with the concepts 


represented. 

2. Heavy use is made of arrays to display 
mathematical properties of functions in terms of 
patterns observed in vectors and matrices (tables), 
and to make possible the reading, writing, and 
evaluation of a host of interesting algebraic 


sentences before approaching the analysis of sentences 
and the concomitant development of identities. 


Such an approach has been adopted in the present text, 
where it has been carried through as far as the treatment of 
polynomials and of linear functions and linear equations. 
The extension to further work in polynomials, to slopes and 
derivatives, and to the circular and hyperbolic functions is 
carried forward in Iverson [9] and in Orth [10] . 


It must be emphasized that the proposed notation, 
though simple, is not limited in application to elementary 
algebra. A glance at the bibliography of Rault and Demars 
[4] will give some idea of the wide range of applicability. 
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The role of the computer. Because the proposed notation is 
Simple and systematic it can be executed by automatic 
computers and has been made available on a number of 
time-shared terminal systems. The most widely used of these 


is described in Falkoff and Iverson [5]. It 1S important to 
note that tne notation is executed directly, and the user 
need learn nothing about the computer itself. In fact, each 


of the examples in this appendix are shown exactly as they 
would be typed on a computer terminal keyboard. 


The computer can obviously be useful in cases where a 
good deal of tedious computation is required, but it can be 
useful in other ways as well. For example, it can be used 
by a student to explore the behavior of functions and 
discover their properties. To do this a student will simply 
enter expressions which apply the functions to various 
arguments. If the terminal is equipped with a display 
device, then such exploration can even be done collectively 
by an entire class. This and other ways of using the 
computer are discussed in Berry et al [6] and in Appendix C. 
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4 The Design of APL 


The Design of APL 


A. D. Falkoff 
K. E. Iverson 


Abstract: This paper discusses the development of APL, emphasizing and illustrating the principles underlying its design. The principle 
of simplicity appears most strongly in the minimization of rules governing the behavior of APL objects, while the principle of practicali- 
ty is served by the design process itself, which relies heavily on experimentation. The paper gives the rationale for many specific de- 


sign choices, including the necessary adjuncts for system management. 


Introduction 

This paper attempts to identify the general principles 
that guided the development of APL and its computer 
realizations, and to show the role these principles played 
in the evolution of the language. The reader will be as- 
sumed to be familiar with the current definition of APL 
[1]. A brief chronology of the development of APL is 
presented in an appendix. 

Different people claiming to follow the same broad 
principles may well arrive at radically different designs; 
an appreciation of the actual role of the principles in de- 
sign can therefore be communicated only by illustrating 
their application in a variety of specific instances. It 
must be remembered, of course, that in the heat of battle 
principles are not applied as consciously or systematical- 
ly as may appear in the telling. Some notion of the evo- 
lution of the ideas may be gained from consulting earlier 
discussions, particularly Refs. 2-4. 

The actual operative principles guiding the design of 
any complex system must be few and broad. In the pres- 
ent instance we believe these principles to be simplicity 
and practicality. Simplicity enters in four guises: uni- 
formity (rules are few and simple), generality (a small 
number of general functions provide as special cases a 
host of more specialized functions), familiarity (familiar 
symbols and usages are adopted whenever possible), 
and brevity (economy of expression is sought). Practi- 
cality is manifested in two respects: concern with actual 
application of the language, and concern with the practi- 
cal limitations imposed by existing equipment. 

We believe that the design of APL was also affected in 
important respects by a number of procedures and cir- 
cumstances. Firstly, from its inception APL has been 


developed by using it in a succession of areas. This 
emphasis on application clearly favors practicality and 
simplicity. The treatment of many different areas fos- 
tered generalization: for example. the general inner 
product was developed in attempting to obtain the ad- 
vantages of ordinary matrix algebra in the treatment of 
symbolic logic. 

Secondly, the lack of any machine realization of the 
language during the first seven or eight years of its de- 
velopment allowed the designers the freedom to make 
radical changes, a freedom not normally enjoyed by de- 
signers who must observe the needs of a large working 
population dependent on the language for their daily 
computing needs. This circumstance was due more 
to the dearth of interest in the language than to foresight. 

Thirdly, at every stage the design of the language was 
controlled by a small group of not more than five people. 
In particular, the men who designed (and coded) the 
implementation were part of the language design group, 
and all members of the design group were involved in 
broad decisions affecting the implementation. On the 
other hand, many ideas were received and accepted 
from people outside the design group, particularly from 
active users of some implementation of APL. 

Finally, design decisions were made by Quaker con- 
sensus; controversial innovations were deferred until 
they could be revised or reevaluated so as to obtain 
unanimous agreement. Unanimity was not achieved 
without cost in time and effort, and many divergent 
paths were explored and assessed. For example, many 
different notations for the circular and hyperbolic func- 
tions were entertained over a period of more than a year 
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before the present scheme was proposed, whereupon 
it was quickly adopted. As the language grows, more 
effort is needed to explore the ramifications of any major 
innovation. Moreover, greater care 1s needed in intro- 
ducing new facilities, to avoid the possibility of later 
retraction that would inconvenience thousands of users. 
An example of the degree of preliminary exploration 
that may be involved is furnished by the depth and di- 
versity of the investigations reported in the papers by 
Ghandour and Mezei [5] and by More [6]. 


The character set 

The typography of a language to be entered at a simple 
keyboard is subject to two major practical restrictions: it 
must be linear, rather than two-dimensional, and it must 
be printable by a limited number of distinct symbols. 

When one is not concerned with an immediate ma- 
chine realization of a language, there is no strong reason 
to so limit the typography and for this reason the lan- 
guage may develop in a freer publication form. Before 
the design of a machine realization of APL. the restric- 
tions appropriate to a keyboard form were not observed. 
In particular, different fonts were used to indicate the 
rank of a variable. In the keyboard form, such distinc- 
tions can be made, if desired, by adopting classes of 
names for certain classes of things. 

The practical objective of linearizing the typography 
also led to increased uniformity and generality. It led to 
the present bracketed form of indexing, which removes 
the rank limitation on arrays imposed by use of super- 
scripts and subscripts. It also led to the regularization of 
the form of dyadic functions such as NaJ and Vu] (later 
eliminated from the language). Finally. it led to writing 
inner and outer products in the linear form +.X and °. X 
and eventually to the recognition of such expressions as 
instances of the use of operators. 

The use of arrays and of operators greatly reduced the 
demand for distinct characters in APL, but the limitations 
imposed by the normal 88-symbol typewriter keyboard 
fostered two innovations which greatly increased the 
utility of the 88 symbols: the systematic use of most 
function symbols to represent both a dyadic and a mo- 
nadic function, as suggested in conventional notation 
by the double use of the minus sign to represent both 
subtraction (a dvadic function) and negation (a monadic 
function): and the use of composite characters formed 
by typing one symbol over another (through the use of 
a backspace), as in > and ! and €. 

It was necessary to restrict the alphabetic characters 
to a single font and capitals were chosen for readability. 
Italics were initially favored because of their common 
use for denoting variables in mathematics, but were 
finally chosen primarily because they distinguished the 
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letter O from the digit O and letters like Z and 7 from the 
graphic symbols | and T. 

To allow the possibility of adding complete alphabetic 
fonts by overstriking, the underscore (_), diaeresis 
(%), overbar ( ), and quad (O) were provided. In the 
APL \360 realization. only the underscore is used in this 
way. The inclusion of the overbar on the typeball fortu- | 
nately filled a need we had not anticipated —a symbol for 
negative constants, distinct from the symbol for the ne- 
gation function. The quad proved a useful symbol alone 
and in combination (as in E), and the diaeresis still re- 
mains unassigned. 

The SELECTRIC® typewriter imposed certain practical 
limitations on the placement of symbols on the keyboard, 
e.g., only narrow characters can appear in the upper 
row of the typing element. Within these limitations we 
attempted to make the keyboard easy to learn by group- 
ing related symbols (such as the relations) in a rational 
order and by making mnemonic associations between 
letters and the functions associated with them in the 
shifted case (such as the magnitude function | with M, 
and the membership symbol e with F). 


Valence and order of execution 

The valence of a function is the number of arguments it 
takes; APL primitives have valences of 1 (monadic 
functions) and 2 (dyadic functions), and user-defined 
functions may have a valence of 0 as well. The form for 
all APL primitives follows the familiar model of arithme- 
tic, that is, the symbol for a dyadic function occurs be- 
tween its arguments (as in 3+4) and the symbol for a 
monadic function occurs before its argument (as in -4). 

A function f of valence greater than two is conven- 
tionally written in the form f(a,b,c,d). This can be 
construed as a monadic function F applied to the vector 
argument a,b,c,d, and this interpretation is used in 
APL. In the APL\360 realization, the arguments a,b,c, 
and d must share a common structure. The definition 
and implementation of generalized arrays, whose ele- 
ments include enclosed arrays, will, of course, remove 
this restriction. 

The result of any primitive APL function depends only 
on its immediate arguments, and the interpretation of 
each part of an APL statement is therefore localized. Like- 
wise, the interpretation of each statement is independent 
of other statements in a program. This independence of 
context contributes significantly to the readability and 
ease of implementation of the language. 

The order of execution of an APL expression is con- 
trolled by parentheses in the familiar way, and parenthe- 
ses are used for no other purpose. The order is other- 
wise determined by one simple rule: the right argument 
of any function is the value of the entire expression fol- 
lowing it. In particular, there is no precedence among 


functions; all functions, user-defined as well as primitive, 
are treated alike. 

This simple rule has several consequences of practical 
advantage to the user: 


a) An unparenthesized expression is easy to read from 
left to right because the first function encountered is 
the major function, the next is the major function in 
its right argument, etc. 

An unparenthesized expression is also easy to read 

from right to left because this is the order in which it 

is executed. 

c) If T is any vector of numerical terms, then the pres- 
ent rule makes the expressions -/T and +/T very 
useful: the former is the alternating sum of T and the 
latter is the alternating product. Moreover, a contin- 
ued fraction may be written without parentheses in 
the form 3++4+45+6, and the efficient evaluation 
of a polynomial can be written without parentheses in 
the form 34+Xx44+Xx54+Xx6. 


b 


= 


The rule that multiplication is executed before addi- 
tion and that the power function is executed before mul- 
tiplication has been long accepted in mathematics. In 
discarding any established rule it is wise to speculate on 
the reasons for its adoption and on whether they still 
apply. This rule makes parentheses unnecessary in the 
writing of polynomials, and this alone appears to be a 
sufficient reason for its original adoption. However, in 
APL a polynomial can be written more perspicuously in 
the form +/CXX*E, which also requires no parentheses. 
The question of the order of execution has been dis- 
cussed in several places: Falkoff et al. [2,3], Berry [7], 
and Appendix A of Iverson [8]. 

The order in which isolated parts of a statement, such 
as the parts (X+4) and (Y-2) in the statement (Y+4) 
x(Y-2), are executed is normally immaterial, but does 
matter when repeated specifications are permitted in a 
statement as in (4+2 )+A. Although the use of such ex- 
pressions is poor practice, it is desirable to make the in- 
terpretation unequivocal: the rule adopted (as given in 
Lathwell and Mezei [9]) is that the rightmost function or 
specification which can be performed is performed first. 

It is interesting to note that the use of embedded as- 
signment was first suggested during the course of the 
implementation when it was realized that special steps 
were needed to prevent it. The order of executing 1so- 
lated parts of a statement was at first left unspecified 
(as stated in Falkoff and Iverson [1]) to allow freedom 
in implementation, since isolated parts could then be 
executed in parallel on any machine offering parallel 
processing. However, embedded assignment found such 
wide use that an unambiguous definition became es- 
sential to fix the behavior of programs moving from 
system to system. 


Another aspect of the order of execution is the order 
among statements, which is normally taken as the order 
Of appearance, except as modified by explicit branches. 
In the publication form of the language branches were 
denoted by arrows drawn from a branch point to the set 
of possible destinations, and the drawing of branch ar- 
rows is still to be recommended as an adjunct for clari- 
fying the structure of a program (Iverson [10], page 3). 

In formalizing branching it was necessary to introduce 
only one new concept (denoted by >) and three simple 
conventions: 1) continuing with the statement indicated 
by the first element of a vector argument of >, or with the 
next statement in sequence if the argument is an empty 
vector, 2) terminating the function if the indicated con- 
tinuation is not the index of a statement in the program, 
and 3) the use of /abels, local names defined by the in- 
dices of juxtaposed statements. At first labels were 
treated as local variables, but it was found to be more 
convenient in both use and implementation to treat them 
as local constants. 


Since the branch arrow can be tollowed by any valid 
expression it provides convenient multi-way conditional 
branches. For example, if L is a Boolean vector and S is 
a corresponding set of statement numbers (often formed 
as the catenation of a set of labels), then >L/S provides 
a (1+pL)-way branch (to one of the elements of S or 
falling through if every element of L is zero); if Z is an 
empty vector or an index to the vector S, then >SL] ] 
provides a similar (1+pL )-way branch. 

Programming languages commonly incorporate special 
forms of sequence control, typified by the DO statement 
Of FORTRAN. These forms are excluded from APL be- 
cause their cost in complication of the language out- 
weighs their utility. The array operations in APL obviate 
many instances of iteration, and those which remain can 
be represented in a variety of ways. For example, group- 
ing the initialization, modification, and testing of the con- 
trol variable at the head of the iterated segment provides 
a particularly perspicuous arrangement. Moreover, 
specialized sequence control statements are usually 
context dependent and necessarily introduce new rules. 

Conditional statements of the IF THEN ELSE type are 
not only context dependent, but their inherent limitation 
to a sequence of binary choices often leads to awkward 
constructions. These, and other, special sequence con- 
trol forms can usually be modeled readily in APL and pro- 
vided as application packages if desired. 


Scalar functions 

The emphasis on generality is illustrated in the defini- 
tions of many of the scalar functions. For example, the 
definition of the factorial is not limited to non-negative 
integers but is extended in the manner of the gamma 
function. Similarly, the residue is extended to all num- 
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bers in a simple and useful way: M |N is defined as the 
smallest (in magnitude) among the quantities N-MxI 
(where J is an integer) which lie in the range from 0 to 
M. If no such quantity exists (as in the case where M is 
zero) then the restriction to the range O to M is discard- 
ed, that is, O |X is X. As another example, 0*0 is defined 
as 1 because that is the limiting value of X*Y when the 
point O O is approached along any path other than the X 
axis, and because this definition is needed to make the 
common general form of writing a polynomial (in which 
the constant term C is written as CXXx0) applicable when 
the value of the argument X is zero. 


The urge to generality must be tempered to avoid set- 
ting traps for the unwary, and compromise is sometimes 
necessary. For example, X+0 could be defined as infinity 
(1.e., the largest representable number in an implementa- 
tion) so as to obviate special treatment of the case Y=0 
when computing the arc tangent of X+Y, but is instead 
defined to yield a domain error. Nevertheless, 0+0 is 
given the value 1, in spite of the fact that the mathe- 
matical argument for it is much weaker than that for 0*0, 
because it was deemed desirable to avoid an error stop 
in this case. 

Eventually it will be desirable to be able to set sepa- 
rate limits on domains to suit various classes of users. 
For example, an implementation that incorporates com- 
plex numbers must yield a result for the expression 
-1x.5 but should admit of being set to yield a domain 
error for a user studying elementary arithmetic. The 
experienced user should be permitted to use an imple- 
mentation in a mode that gives him complete control of 
domain and other errors, i.e., an error should not stop 
execution but should give necessary information about 
the error in a form which can be used by the program in 
which it occurs. Such a facility has not yet been incorpo- 
rated in APL implementations. 

A very general and useful set of functions was intro- 
duced by adopting the relation symbols < < = > > #to 
represent functions (i.e., propositions ) rather than asser- 
tions. The result of any proposition was defined to be O 
or 1 (rather than, say. true or false) so that it would lie 
in the domain of other arithmetic functions. Thus X=Y 
and X#Y represent general comparisons, but if X and Y 
are integers then X=Y is the Kronecker delta and X#Y is 
its inverse; if X and Y are Boolean variables, then XY is 
the exclusive-or and X<Y is material implication. This 
definition also allows expressions that incorporate 
both relational and arithmetic functions (such as 
(2=+/[110=8°.|5)/S+ıN, which yields the primes up 
to integer N). Moreover, identities among Boolean func- 
tions are more evident when expressed in these terms 
than when expressed in more conventional symbols. 

The adoption of the relation symbols as functions 
does not preclude their use as assertions in informal sen- 
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tences. For example, although one might feel compelled 
to substitute “X<Y is true” for “X<Y” in the sentence 
“If X<Y then (X<Y)v(X=Y)’’, there is no more reason 
to do so than to substitute “Bob is there is true” for 
“Bob is there” in the sentence which begins “If Bob is 
there then. . .” 


Although we strove to adopt familiar symbols and 
usage, any clash with the principle of uniformity was 
invariably resolved in favor of uniformity. For example, 
familiar symbols (such as + - X +) are used where 
possible, but anomalies such as |X| for magnitude and 
N ! for factorial are regularized to |X and !N. Notation 
such as X” for power and (¥) for the binomial coeffi- 
cient are replaced by regular dyadic forms X*XN and M!N. 
Elision of the times sign is not permitted; this allows the 
use of multiple-character names and avoids confusion 
between multiplication, as in X(X+3 ), and the applica- 
tion of a function, as in F(X+3). 

Moreover, each of the primitive scalar functions in 
APL 1s extended to arrays in exactly the same way. In 
particular, if V and W are vectors the expressions VxW 
and 3+V are permitted as well as the expressions V+W 
and 3XV, although only the latter pair would be permit- 
ted (in the sense used in APL) in conventional vector 
algebra. 

One view of simplicity might exclude as redundant 
those functions which are easily expressed in terms of 
others. For example, [X may be written as -L-X, and 
[/X may be written as -L/-X, and A/L may be written 
as “V/“L. From another viewpoint it is simpler to use a 
more complete or symmetric set of primitives, since one 
need not remember which of a pair is provided and how to 
express the other in terms of it. In APL, completeness has 
been favored. For example, symbols are provided for all 
of the nontrivial logical functions although all are easily 
expressed in terms of a small subset of them. 

The use of the circle to denote the whole family of 
functions related to the circular functions is a practical 
technique for conserving symbols as well as a useful 
generalization. It leads to many convenient expressions 
involving reduction and inner and outer products (such 
as 1 2 30.0OX for a table of sines, cosines and tan- 
gents). Moreover, anyone wishing to use the symbol 
SIN for the sine function can define the function SIN as 
either 10X (for radian arguments) or 10Xx180+01 (for 
degree arguments). The notational scheme employed for 
the circular functions must clearly be used with discre- 
tion; 1t could be used to replace all monadic functions by 
a single dyadic function with an integer left argument to 
encode each monadic function. 


Operators 
The dot in the expression M+.XN is an example of an 
operator; it takes functions (in this case + and x) as 


arguments and produces a new function called an inner 
product. (In elementary mathematics the term operator 
is also used as a synonym for function, but in APL we 
eschew this usage.) The evolution of operators in APL 
furnishes an example of growing generality which has as 
yet been neither fully exploited nor fully regularized. 

The operators now in APL were introduced one by one 
(reduction, then inner product, then outer product, then 
axis operators such as O[7]) without being recognized 
as members of a class. When this class property was 
recognized it was apparent that the operators had not 
been given a consistent syntax and that the notation 
should eventually be regularized to give operators the 
same syntax as functions, i.e., an operator taking two 
arguments occurs between its (function) arguments (as 
in +.X) and an operator taking one argument appears in 
front of it. It also became evident that our treatment of 
operators had introduced a useful heirarchy into the 
order of execution, operators being executed before 
functions. 


The recognition of operators as such has also made 
clear the much broader role they might be expected to 
play —derivative and integral operators are only two of 
many useful operators that must be added to the lan- 
guage. 

The use of the outer product operator furnishes a 
clear example of a significant process in the evolution of 
the language: when a new facility is introduced it takes 
considerable time to recognize the many ways in which 
it can be used and therefore to appreciate its role in the 
further development of the language. The notation a’ (n) 
(later regularized to Nae) had been introduced early to 
represent a prefix vector, i.e., a Boolean vector of N ele- 
ments with J leading 1's. Some thought had been given 
to extending the definition to a vector J (perhaps to 
yield an V=column matrix whose rows were prefix vec- 
tors determined by the elements of J) but no decision 
had been taken. When considering such an extension we 
normally communicate by defining any proposed nota- 
tion in terms of existing primitives. After the outer prod- 
uct was introduced the proposed extension was written 
simply as Jo . 21 Ñ, and it became clear that the function 
a was now redundant. 

One should not conclude from this example that every 
function or set of functions easily expressed in terms of 
another is discarded as redundant; judgment must be 
exercised. In the present instance the a was discarded 
partly because it was too restrictive, i.e., the outer prod- 
uct form could be applied to yield a host of related func- 
tions (such as Jo .<1N and Jo.<®ıN) not all of which 
were expressible in terms of the prefix and suffix func- 
tions & and w. As mentioned in the discussion of scalar 
functions, the completeness of an obvious family of 
functions is also a factor to be considered. 


Operators are attractive from several points of view. 
Because they provide a scheme for denoting whole 
classes of related functions, they offer uniformity of 
expression and great economy of symbols. The concise- 
ness of expression that they allow can also be directly 
related to efficiency of implementation. Moreover, 
they introduce a new level of generality which plays an 
important role in the formal manipulability of the lan- 
guage. 


Formal manipulation 

APL is rich in identities and is therefore amenable to a 
great deal of fruitful formal manipulation. For example, 
many of the familiar identities of ordinary matrix algebra 
extend to inner products other than +.X, and de Mor- 
gan’s law and other dualities extend to inner and outer 
products on arrays. The emphasis on generality, unifor- 
mity, and simplicity is likely to lead to a language rich in 
identities, but our emphasis on identities has been such 
that it should perhaps be enunciated as a separate and 
important guiding principle. Indeed, the preface to Iver- 
son [10] cites one chapter (on the logical calculus) as 
illustration of “the formal manipulability of the language 
and its utility in theoretical work”. A variety of identi- 
ties is treated in [10] and [11], and a schema for proofs 
in APL is presented in [12]. 

Two examples will be used to illustrate the role of 
identities in the development of the language. The iden- 
tity 
(+/X)=(+/U/X)++/(=U)/X 


applies for any numerical vector X and logical vector U. 
Maintaining this identity for the case where U is a vector 
of zeros forces one to define the sum over an empty 
vector as zero. A similar identity holds for reduction by 
any associative and commutative function and leads one 
to define the reduction of an empty array by any func- 
tion as the identity element of that function. 

The dyadic transpose 794 performs a general permu- 
tation on the coordinates of A as specified by the argu- 
ment J. The monadic transpose is a special case which, 
in order to yield ordinary matrix transpose for an array 
of rank two, was initially defined to interchange the last 
two coordinates. It was later realized that the identity 


A/ , (M+. xIV)=Q(QN)+. xQM 


expected to hold for matrices would not hold for higher 
rank arrays. To make the identity true in general, the 
monadic transpose was defined to reverse the order of 
the coordinates as follows: 


A/ , (QA )=(Or pp J&A. 
Moreover, the form chosen for the left argument of the 
dyadic transpose led to the following important identity: 


A/ , (TQJQA)=I[J IQA. 
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Execute and format 

In designing an executable language there is a funda- 
mental choice to be made: Is the statement of an expres- 
sion to be taken as an order to evaluate it, or must the 
evaluation be indicated by an explicit function in the 
language? This decision was made very early in the de- 
velopment of APL, albeit with little deliberation. Never- 
theless, once the choice became manifest, early in the 
development of the implementation, it was applied uni- 
formly in all situations. 


There were some arguments against this, of course, 
particularly in the application of a function to its argu- 
ments, where it is often useful to be able to “call by 
name,” which requires that the evaluation of the argu- 
ment be deferred. But if implemented literally (i.e., if 
functions could be defined with this as an option) then 
names per se would have to be known to the language 
and would constitute an additional object type with its 
own rules of behavior and specialized primitive func- 
tions. A deliberate effort had been made to eliminate 
unnecessary type distinctions, as in the uniform lan- 
guage treatment of numbers regardless of their internal 
representation, and this point of view prevailed. In the 
interest of Keeping the semantic rules simple, the idea of 
“call by name” was rejected as a primitive concept in APL. 

Nevertheless, there are important cases where the 
formal argument of a function should not be evaluated at 
the time of invocation—as in the application of a gen- 
eralized root finder to an arbitrary function. There are 
also situations where it is useful to inhibit evaluation of 
an expression, as in certain conditional forms, and the 
need for some treatment of the problem was clear. The 
basis for a solution was at hand in the form of character 
arrays, which were already objects of the language. Ef- 
fectively, putting quotes around a statement inhibits its 
execution by making it a data item, a character array 
subject to the normal language functions. To get the ef- 
fect of working with names, or with expressions to be 
conditionally evaluated, it was only necessary to intro- 
duce the notion of ““unquote,” or more properly “exe- 
cute,” as a function that would cause a character array 
to be evaluated as if it were the same expression without 
the inhibition. 

The actual introduction of the execute function did 
not come for some time after its recognition as the likely 
solution. The development that preceded its final accep- 
tance into APL illustrates several design principles. 

The concept of an execute function is a very powerful 
one. In a sense, it makes the language “self-conscious,” 
and introduces endless possibilities for obscurity in pro- 
grams. This might have been a reason for not allowing it, 
but we had long since realized that a general-purpose 
language cannot be made foolproof and remain effective. 
Furthermore, APL is easily partitioned, and beginning 
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users, or users of application packages, need not know 
about more sophisticated aspects of the language. The 
real issues were whether the function was of sufficiently 
broad utility, whether it could be defined simply, and 
whether it was perhaps a special case of a more general 
capability that should be implemented instead. There 
was also the need to establish a symbol for it. 

The case for general utility was easily made. The exe- 
cute function does allow names to be used as arguments 
to functions without the need for a new data type; it 
provides the means for generating variables under pro- 
gram control, which can be useful, for example, in man- 
aging data that do not conveniently fit into rectangular 
arrays: it allows the construction and execution of state- 
ments under program control; and in interpretive imple- 
mentations it provides conversion from characters to 
numbers at machine speeds. 


The behavior of the execute function is simply de- 
scribed: it treats a character array argument as a repre- 
sentation of an APL statement and attempts to evaluate or 
execute the statement so represented. System commands 
and attempts to enter function definition mode are not 
valid APL statements and are excluded from the domain 
of execute. It can be said that, except for these exclu- 
sions, execute acts upon a character array as if the ele- 
ments of the array were entered at a terminal in the im- 
mediate execution mode. 

Incidentally, there was pressure to arbitrarily include 
system commands in the domain of execute as a means 
of providing access to other workspaces under program 
control in order to facilitate work with large collections 
of data. This was resisted on the basis that the execute 
function should not allow by subterfuge what was other- 
wise disallowed. Indeed, consideration of this aspect of 
the behavior of execute led to the removal of certain 
anomalies in function definition and a clarification of the 
role of the escape characters) and V. 

The question of generality has not been finally settled. 
Certainly, the execute function could be considered a 
member of a class that includes constructs like those of 
the lambda calculus. But it is not necessary to have the 
ultimate answer in order to proceed, and the simplicity 
of the definition adopted gives some assurance that gen- 
eralizations are not being foreclosed. 

For some time during its experimental implementation 
the symbol for execute was the epsilon. This was chosen 
for obvious mnemonic reasons and because no other 
monadic use was made of this symbol. As thought was 
being given to another new function —format-—it was 
observed that over some part of each of their domains 
format and execute were inverses. Furthermore, over 
these parts of their domains they were strongly related to 
the functions encode and decode, and we therefore 
adopted their symbols overstruck by the symbol o. 


The format function furnishes another example of a 
primitive whose behavior was first defined and long ex- 
perimented with by means of APL defined functions. 
These defined functions were the DFT (Decimal 
Format) and EFT (Exponential Format) familiar to 
most users of the APL system. The main advantage of 
the primitive format function over these definitions is its 
much more efficient use of computer time. 

The format function has both a dyadic and a monadic 
definition, but the execute function is monadic only. 
This leaves the way open for a related dyadic function, 
for which there has been no dearth of suggestions, but 
none will be adopted until more experience has been 
gained in the use of what we already have. 


System commands and other environmental 
facilities 

The definition of APL is purely abstract: the objects of 
the language, arrays of numbers and characters, are act- 
ed upon by the primitive functions in a manner indepen- 
dent of their representation and independent of any 
practical interpretation placed upon them. The advan- 
tages of such an abstract definition are that it makes the 
language truly machine independent, and avoids bias in 
favor of particular application areas. But not everything 
in a computing system is abstract, and provision must be 
made to manage system resources and otherwise com- 
municate with the environment in which the language 
functions operate. 

Maintaining the abstract nature of the language in a 
real computing system therefore seemed to imply a need 
for language-like facilities in some sense outside of APL. 
The need was first met by the use of system commands, 
which are syntactically not part of APL, and are also ex- 
cluded from dynamic use within APL programs. They 
provided a simple and, in some ways, convenient answer 
to the problem of system management, but proved insuf- 
ficient because the actions and information provided by 
them are often required dynamically. 

The exclusion of system commands from programs 
was based more strongly on engineering considerations 
than on a theoretic compulsion, since the syntactic dis- 
tinction alone sets them apart from the language, but 
there remained a reluctance to allow such syntactic 
anomalies in a program. The real issue, which was 
whether the functions provided by the system com- 
mands were properly the province of APL, was tabled for 
the time being, and defined functions that mimic the ac- 
tions of certain of them were introduced to allow dy- 
namic execution. The functions so provided were those 
affecting only the environment within a workspace, such 
as width and origin, while those that would have affected 
major physical resources of the system were still exclud- 
ed for engineering reasons. 


These environmental defined functions were based on 
the use of still another class of functions—called “I- 
beams” because of the shape of the symbol used for 
them — which provide a more general facility for commu- 
nication between APL programs and the less abstract 
parts of the system. The I-beam functions were first in- 
troduced by the system programmers to allow them to 
execute System/360 instructions from within APL pro- 
grams, and thus use APL as a direct aid in their program- 
ming activity. The obvious convenience of functions of 
this kind, which appeared to be part of the language, led 
to the introduction of the monadic I-beam function for 
direct use by anyone. Various arguments to this function 
yielded information about the environment such as avail- 
able space and time of day. 

Though clearly an ad hoc facility, the I-beam func- 
tions appear to be part of the language because they 
obey APL syntax and can be executed from within an 
APL program. They were too useful to do without in the 
absence of a more rational solution to the problem, and 
so were graced with the designation ““system-dependent 
functions,” while we continued to use the system and 
think about the general problem of communication 
among the subsystems composing it. 


Shared variables 

The logical basis for a generalized communication facil- 
ity in APL\360 was laid in 1964 with the publication of 
the formal description of System/360 [2]. It was then 
observed that the interaction between concurrent “asyn- 
chronous” processes (programs) could be completely 
comprehended by an interface comprising variables that 
were shared by the cooperating processes. (Another fa- 
cility was also used, where one program forced a branch 
in another, but this can be regarded as a derivative rep- 
resentation based on variables shared between one 
program and a processor that drives the other.) It was 
not until six or seven years later, however, that the full 
force of this observation was brought to bear on the 
practical problem of controlling in an organic way the 
environment in which APL programs run. 

Three processors can be identified during the execu- 
tion of an APL program: APL, or the processor that ac- 
tually executes the program: the system, or host that 
manages libraries and other environmental factors, 
which in APL\360 is the System/360 processor; and the 
user, who may be observing and processing output or 
providing input to the program. The link between APL 
and system is the set of I-beam functions, that between 
user and system is the set of system commands, and 
between user and APL, the quad and quote-quad. With 
the exception of the quote-quad, which is a true variable, 
all these links are constructs on the interfaces rather 
than the interfaces themselves. 
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It can be seen that the quote-quad is shared by the 
user and APL. Characteristically, a value assigned to it in 
a program is presented to the user at the terminal, who 
utilizes this information as he sees fit. If later read by the 
program, the value of the quote-quad then has no fixed 
relationship to what was earlier specified by the pro- 
gram. The values written and read by the program are 
a fortiori APL objects—abstract arrays—but they may 
have practical significance to the user-processor. sug- 
gesting, for example, that an experimental observation 
be made and the results entered at the keyboard. 

Using the quote-quad as the paradigm for their behav- 
ior, a general facility for shared variables was designed 
and implemented starting in late 1969 (see Lathwell 
[13]). The underlying concept was to provide communi- 
cation across the boundary between independent proces- 
sors by explicitly establishing certain variables as being 
shared between them. A shared variable is syntactically 
indistinguishable from others and may be used normally 
either on the right or left of an assignment arrow. 

Although motivated most strongly at the time by a 
need to provide a “file and I/O” capability for APL \360, 
the shared variable facility satisfied other needs as well. 
a significant criterion for the inclusion of a new feature 
in the language. It provides for general communication. 
not only between APL and the host system, but also 
between APL programs running concurrently at different 
terminals, which is in a sense a more fundamental use of 
the idea. 

_ Perhaps as important as the practical use of the facil- 

ity is the potency that an implementation lends to the 
concept of shared variables as a basis for understanding 
communication in any system. With respect to APL \360, 
for example, we had long used the term “distinguished 
variable” in discussing the interface between APL and 
system, meaning thereby variables, like trace and stop 
vectors, which hold control or state information. It is 
now clear that “distinguished variables” are shared vari- 
ables, distinguished from ordinary variables by the fact 
of their being shared, and further qualified by their 
membership in a particular interface. In principle. the 
environment and resources of APL\360 could be com- 
pletely controlled through the use of an appropriate set 
of such distinguished variables. 


System functions 

In a given application area it is usually easier to work 
with APL augmented by defined functions, designed to 
embody the significant concepts of the area, than with 
the primitive functions of the language alone. Such de- 
fined functions, together with the relevant variables or 
data objects, constitute an application language, or appli- 
cation extension. Managing the resources or environ- 
ment of an APL computing system is a particular applica- 
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tion, in which the data objects are the distinguished vari- 
ables that define the interface between APL and system. 

For convenience, the defined functions constituting an 
application extension for system management should 
behave differently from other defined functions, at least 
to the extent of being available at all times, like the prim- 
itives, without having to be copied from workspace to 
workspace. Such ubiquity requires that the names of 
these functions be distinguished from those a user might 
invent. This distinction can only be made, if APL is to 
remain essentially context independent, by the establish- 
ment of a class of reserved names. This class has been 
defined as names starting with the quad character, and 
functions having such names are called system functions. 
A similar naming convention applies to distinguished 
variables, or system variables, as they are now called. 

In principle, system functions work with system vari- 
ables that are independently identifiable. In practice, 
the system variables in a particular situation may not 
be available explicitly, and the system functions may 
be locked. This can come about because direct access to 
the interface by the user is deemed undesirable for tech- 
nical reasons, or because of economic considerations 
such as efficiency or protection of proprietary rights. In 
such situations system functions are superficially distin- 
guishable from primitive functions only by virtue of the 
naming convention. 


The present I-beam functions behave like system 
functions. Fortunately, there are only two of them: the 
monadic function that is familiar to all users of APL, and 
the dyadic function that is still known mostly to system 
programmers. Despite their usefulness, these functions 
are hardly to be taken as examples of good application 
language design, depending as they do on arbitrary nu- 
merical arguments to give them meaning, and having no 
meaningful relationships with each other. The monadic 
I-beams are more like read-only variables — changeable 
constants, as it were — than functions. Indeed, except for 
their syntax, they behave precisely like shared variables 
where the processor on the other side replaces the value 
between each reference on the APL side. 


The shared variable facility itself requires communica- 
tion between APL and system in order to establish a de- 
sired interface between APL and cooperating processors. 
The prospect of inventing new system commands for 
this, or otherwise providing an ad hoc facility, was most 
distasteful, and consideration of this problem was a ma- 
Jor factor in leading toward the system function concept. 
It was taken as an indication of the validity of the shared 
variable approach to communication when the solution 
to the problem it engendered was found within the con- 
ceptual framework it provided, and this solution also 
proved to be a basis for clarifying the role of facilities 
already present. 


In due course a set of system functions must be de- 
signed to parallel the facilities now provided by system 
commands and go beyond them. Aside from the obvi- 
ous advantage of being dynamically executable, such a 
set of system functions will have other advantages and 
some disadvantages. The major operational advantage 
is that the system functions will be able to use the full 
power of APL to generate their arguments and exploit 
their results. Countering this, there is the fact that this 
power has a price: the automatic name isolation provided 
by the extralingual system commands will not be avail- 
able to the system functions. Names used as arguments 
will have to be presented as character arrays, which is not 
a disadvantage in programs, although it is less convenient 
for casual keyboard entry than is the use of unadorned 
names in system commands. 

A more profound advantage of system functions over 
system commands lies in the possibility of designing the 
former to work together constructively. System com- 
mands are foreclosed from this by the rudimentary na- 
ture of their syntax; they do constitute a language, but 
one having no constructive potential. 


Workspaces, files, and input-output 

The workspace organization of APL\360 libraries serves 
to group together functions and variables intended to 
work together, and to render them active or inactive as a 
group, preserving the state of the computation during 
periods of inactivity. Workspaces also implicitly qualify 
the names of objects within them, so that the same name 
may be used independently in a multiplicity of work- 
spaces in a given system. These are useful attributes; the 
grouping feature, for example, contributes strongly to 
the convenience of using APL by obviating the linkage 
problems found in other library systems. 

On the other hand, engineering decisions made early 
in the development of ApL\360 determined that the 
workspaces be of fixed size. This limits the size of ob- 
jects that can be managed within them and often be- 
comes an inconvenience. Consequently, as usage of 
APL\360 developed, a demand arose for a “file” facility, 
at first to work with large volumes of data under pro- 
gram control, and later to utilize data generated by other 
systems. There was also a demand to make use of high- 
speed input and output equipment. As noted in an earlier 
section, these demands led in time to the development of 
the shared variable facility. Three considerations were 
paramount in arriving at this solution. 

One consideration was the determination to maintain 
the abstract nature of APL. In particular, the use of prim- 
itive functions whose definitions depend on the repre- 
sentation of their arguments was to be avoided. This 
alone was sufficient to rule out the notion of a file as a 


formal concept in the language. APL has primitive array 
Structures that either encompass the logical structure of 
files or can be extended to do so by relatively simple 
functions defined on them. The user of APL may regard 
any array or collection of arrays as a file, and in princi- 
ple should be able to use the data so organized without 
regard to the medium on which these arrays may be 
stored. 

The second consideration was the not uncommon 
observation that files are used in two ways, as a medium 
for exchange of information and as a dynamic exten- 
sion of working storage during computation (see Falkoff 
[14]). In keeping with the principle just noted, the 
proper solution to the second problem must ultimately 
be the removal of workspace size limitations, and this 
will probably be achieved in the course of general de- 
velopments in the industry. We saw no prospect of a sat- 
isfactory direct solution being achieved locally in a 
reasonable time, so attention was concentrated on the 
first problem in the expectation that, with a good general 
communication facility, on-line storage devices could be 
used for workspace extension at least as effectively as 
they are so used in other systems. 


The third consideration was one of generality. One 
possible approach to the communication problem would 
have been to increase the roster of system commands 
and make them dynamically executable, or add varia- 
tions to the I-beam functions to manage specific storage 
media and I/O equipment or access methods. But in ad- 
dition to being unpleasant because of its ad hoc nature, 
this approach did not promise to be general enough. In 
working interactively with large collections of data, for 
example, the possible functional variations are almost 
limitless. Various classes of users may be allowed ac- 
cess for different purposes under a variety of controls, 
and unless it is intended to impose restrictive constraints 
ahead of time, it is futile to try to anticipate the solutions 
to particular problems. Thus, to provide a communica- 
tion facility by accretion appeared to be an endless task. 


The shared variable approach is general enough be- 
cause, by making the interface explicitly available with 
primitive controls on the behavior of the shared variable, 
it provides only the basic communication mechanism. It 
then remains for the specific problem to be managed by 
bringing to bear on it the full power of APL on one side, 
and that of the host system on the other. The only re- 
maining question is one of performance: does the shared 
variable concept provide the basis for an effective imple- 
mentation? This question has been answered affirma- 
tively as a result of direct experimentation. 

The net effect of this approach has been to provide for 
APL an application extension comprising the few system 
functions necessary to manage shared variables. Actual 
file or I/O applications are managed, as required, by 
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user-defined functions. The system functions are used 
only to establish sharing, and the shared variables are 
then used for the actual transfer of information between 
APL workspaces and file or I/O processors. 


Appendix. Chronology of APL development 

The development of APL was begun in 1957 as a neces- 
sary tool for writing clearly about various topics of inter- 
est in data processing. The early development is de- 
scribed in the preface of Iverson [10] and Brooks and 
Iverson [15]. Falkoff became interested in the work 
shortly after Iverson joined IBM in 1960, and used the 
language in his work on parallel search memories [16]. 
In early 1963 Falkoff began work on a formal descrip- 
tion of System/360 in APL and was later joined in this 
work by Iverson and Sussenguth [2]. 

Throughout this early period the language was used 
by both Falkoff and Iverson in the teaching of various 
topics at various universities and at the IBM Systems 
Research Institute. Early in 1964 Iverson began using it 
in a course in elementary functions at the Fox Lane 
High School in Bedford, New York, and in 1966 pub- 
lished a text that grew out of this work [8]. John L. 
Lawrence (who, as editor of the IBM Systems Journal, 
procured and assisted in the publication of the formal 
description of System/360) became interested in the use 
of APL at high school and college level and invited the 
authors to consult with him in the development of cur- 
riculum material based on the use of computers. This 
work led to the preparation of curriculum material in a 
number of areas and to the publication of an APL \360 
Reference Manual by Sandra Pakin [17]. 

Although our work through 1964 had been focused on 
the language as a tool for communication among people, 
we never doubted that the same characteristics which 
make the language good for this purpose would make it 
good for communication with a machine. In 1963 Her- 
bert Hellerman implemented a portion of the language 
on an IBM/1620 as reported in [18]. Hellerman’s sys- 
tem was used by students in the high school course with 
encouraging results. This, together with our earlier work 
in education, heightened our interest in a full-scale imple- 
mentation. 

When the work on the formal description of Sys- 
tem/360 was finished in 1964 we turned our attention to 
the problem of implementation. This work was brought 
to rapid fruition in 1965 when Lawrence M. Breed 
joined the project and, together with Philip S. Abrams, 
produced an implementation on the 7090 by the end of 
1965. Influenced by Hellerman’s interest in time-sharing 
we had already developed an APL typing element for the 
IBM 1050 computer terminal. This was used in early 
1966 when Breed adapted the 7090 system to an experi- 
mental time-sharing system developed under Andrew 
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Kinslow, allowing us the first use of APL in the manner 
familiar today. By November 1966, the system had been 
reprogrammed for System/360 and APL service has been 
available within IBM sirce that date. The system be- 
came available outside IBM in 1968. 

A paper by Falkoff and Iverson [3] provided the first 
published description of the APL\360 system, and a 
companion paper by Breed and Lathwell [19] treated 
the implementation. R. H. Lathwell joined the design 
group in 1966 and has since been concerned primarily 
with the implementations of APL and with the use of APL 
itself in the design process. In 1971 he published, to- 
gether with Jorge Mezei, a formal definition of APL in 
APL [9]. 

The APL\360 System benefited from the contributions 
of many outside of the central design group. The preface 
to the User’s Manual [1] acknowledges many of these 
contributions. 
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This paper is a discussion of the 
evolution of the APL language, and it 
treats implementations and applications 
only to the extent that they appear to have 
exercised a major influence on that 
evolution. Other sources of historical 
information are cited in References 1-3; in 
particular, The Design of APL [1] provides 
supplementary detail on the reasons behind 
many Of the design decisions made in the 
development of the language. Readers 
requiring background on the current 
definition of the language should consult 
APL Language LH]. 


Although we have attempted to confirm 
our recollections by reference to written 
documents and to the memories of our 
colleagues, this remains a personal view 
which the reader should perhaps supplement 
by consulting the references provided. In 
particular, much information about 
individual contributions will be found in 
the Appendix to The Design of APL [1], and 
in the Acknowledgements in A Programming 
Language [10] and in APL\360 User's Manual 
[23]. Because Reference 23 may no longer 
be readily available, the acknowledgements 
from it are reprinted in Appendix A. 


McDonnell's recent paper on the 
development of the notation for the 
circular functions [5] shows that the 
detailed evolution of any one facet of the 
language can be both interesting and 
illuminating. Too much detail in the 
present paper would, however, tend to 
obscure the main points, and we have 
therefore limited ourselves to one such 
example. We can only hope that other 
contributors will publish their views on 
the detailed developments of other facets 
of the language, and on the development of 
various applications of it. 


The development of the language was 
first begun by Iverson as a tool for 
describing and analyzing various topics in 
data processing, for use in teaching 
classes, and in writing a book, Automatic 


Data Processing [6], undertaken together 
with Frederick P. Brooks, Jr., then a 
graduate student at Harvard. Because the 
work began as incidental to other work, it 
is difficult to pinpoint the beginning, but 
it was probably early 1956; the first 
explicit use of the language to provide 
communication between the designers and 
programmers of a complex system occurred 
during a leave from Harvard spent with the 
management consulting firm of McKinsey and 
Company in 1957. Even after others were 
drawn into the development of the language, 
this development remained largely 
incidental to the work in which it was 
used. For example, Falkoff was first 
attracted to it (shortly after Iverson 
joined IBM in 1960) by its use as a tool in 
his work in parallel search memories [7], 
and in 1964 we began to plan an 
implementation of the language to enhance 
its utility as a design tool, work which 
came to fruition when we were joined by 
Lawrence M, Breed in 1965. 


The most important influences in the 
early phase appear to be Iverson's 
background in mathematics, his thesis work 
in the machine solutions of linear 
differential equations [81] for an economic 
input-output model proposed by Professor 
Wassily Leontief (who, with Professor 
Howard Aiken, served as thesis adviser), 
and Professor Aiken's interest in the 
newly-developing field of commercial 
applications of computers. Falkoff brought 
to the work a background in engineering and 
technical development, with experience ina 
number of disciplines, which had left him 
convinced of the overriding importance of 
simplicity, particularly ina field as 
subject to complication as data processing. 


Although the evolution has been 
continuous, it will be helpful to 
distinguish four phases according to the 
major use or preoccupation of the period: 
academic use (to 1960), machine description 
(1961-1963), implementation (1964-1968), 
and systems (after 1968). 
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A ACADEMIC USE 


The machine programming required in 
Iverson's thesis work was directed at the 
development of a set of subroutines 
designed to permit convenient 
experimentation with a variety of 
mathematical methods. This implementation 
experience led to an emphasis on 
implementable language constructs, and to 
an understanding of the role of the 
representation of data. 


The mathematical background shows 
itself in a variety of ways, notably: 


1. In the use of functions with 
explicit arguments and explicit results; 
even the relations (< < = > > #) are 
treated as such functions. 


2. In the use of logical functions and 
logical variables. For example, the 
compression function (denoted by /) uses 
as one argument a logical vector which 
is, in effect, the characteristic vector 
of the subset selected by compression. 


3. In the use of concepts and 
terminology from tensor analysis, as in 
inner product and outer product and in 
the use of rank for the "dimensionality" 
of an array, and in the treatment of a 


scalar as an array of rank zero. 


4, In the emphasis on generality. For 
example, the generalizations of 
Summation (by F/), of inner product (by 
F.G), and of outer product (by ©.) 
extended the utility of these functions 
far beyond their original area of 
application. 


5. In the emphasis on identities 
(already evident in [9]) which makes the 
language more useful for analytic 
purposes, and which leads to a uniform 
treatment of special cases as, for 
example, the definition of the reduction 
of an empty vector, first given in A 


Programming Language [10]. 


In 1954 Harvard University published 
an announcement [11] of a new graduate 
program in Automatic Data Processing 
organized by Professor Aiken. (The program 
was also reported in a conference on 
computer education [12]). Iverson was one 
of the new faculty appointed to prosecute 
the program; working under the guidance of 
Professor Aiken in the development of new 
courses provided a stimulus to his interest 
in developing notation, and the diversity 
of interests embraced by the program 
promoted a broad view of applications. 


The state of the language at the end 
of the academic period is best represented 
by the presentation in A Programming 
Language [10], submitted for publication in 
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early 1961. The evolution in the latter 
part of the period is best seen by 
comparing references 9 and 10. This 
comparison shows that reduction and inner 
and outer product were all introduced in 
that period, although not then recognized 
as a class later called operators. It also 
shows that specification was originally (in 
Reference 9) denoted by placing the 
specified name at the right, as in P+Q>2. 
The arguments (due in part to F.P. Brooks, 
Jr.) which led to the present form (Z+Pt+tQ) 
were that it better conformed to the 
mathematical form 24=2+Q, and that in 
reading a program, any backward reference 
to determine how a given variable was 
specified would be facilitated if the 
specified variables were aligned at the 
left margin. What this comparison does not 
show is the removal of a number of special 
comparison functions (such as the 
comparison of a vector with each row of a 
matrix) which were seen to be unnecessary 
when the power of the inner product began 
to be appreciated, as in the expression 
MA.=V, This removal provides one example 
of the simplification of the language 
produced by generalizations. 


2% MACHINE DESCRIPTION 


The machine description phase was 
marked by the complete or partial 
description of a number of computer 
systems. The first use of the language to 
describe a complete computing system was 
begun in early 1962 when Falkoff discussed 
with Dr. W.C. Carter his work in the 
standardization of the instruction set for 
the machines that were to become the IBM 
System/360 family. Falkoff agreed to 
undertake a formal description of the 
machine language, largely as a vehicle for 
demonstrating how parallel processes could 
be rigorously represented. He was later 
joined in this work by Iverson when he 
returned from a short leave at Harvard, and 
still later by E.H. Sussenguth. This work 
was published as "A Formal Description of 
System/360" [13]. 


This phase was also marked by a 
consolidation and regularization of many 
aspects which had little to do with machine 
description. For example, the cumbersome 
definition of maximum and minimum (denoted 
in Reference 10 by YI V and ULV and 
equivalent to what would now be written as 
[/U/V and L/U/V) was replaced, at the 
Suggestion of Herbert Hellerman, by the 
present Simple scalar functions. This 
simplification was deemed practical because 
of our increased understanding of the 
potential of reduction and inner and outer 
product. 


The best picture of the evolution in 
this period is given by a comparison of A 
Programming Language [10] on the one hand, 


and "A Formal Description of System/360" 
L13] and "Formalism in Programming 


Languages" [14] on the other. Using 
explicit page references to Reference 10, 
we will now give some further examples of 
regularization during this period: 


1. The elimination of embracing symbols 
(such as |X| for absolute value, LX] for 
floor, and [X] for ceiling) and 
replacement by the leading symbol only, 
thus unifying the syntax for monadic 
Functions. 


2. The conscious use of a single 
function symbol to represent both a 
monadic and a dyadic function (still 
referred to in Reference 10 as unary and 
binary). 


3. The adoption of multi-character 
names which, because of the failure 
(page 11) to insist on no elision of the 
times sign, had been permitted (page 10) 
only with a special indicator. 


4. The rigorous adoption of a 
right-to-left order of execution which, 
although stated (page 8) had been 
violated by the unconscious application 
of the familiar precedence rules of 
mathematics. Reasons for this choice 
are presented in Elementary Functions 
[15], in Berry's APL\360 Primer [16], 
and in The Design of APL [1]: 


5. The concomitant definition of 
reduction based on a right-to-left order 
of execution as opposed to the opposite 
convention defined on page 16. 


6. Elimination of the requirement for 
parentheses surrounding an expression 
involving a relation (page 11). An 
example of the use without parentheses 
occurs near the bottom of page 241 of 
Reference 13. 


7. The elimination of implicit 
specification of a variable (that is, 
the specification of some function of 
it, as in the expression 1S+2 on page 
81), and its replacement by an explicit 
inverse function (T in the cited 
example). 


Perhaps the most important 
developments of this period were in the use 
of a collection of concurrent autonomous 
programs to describe a system, and the 
formalization of shared variables as the 
means of communication among the programs. 
Again, comparisons may be made between the 
system of programs of Reference 13, and the 
more informal use of concurrent programs 
introduced on page 88 of Reference 10. 


It is interesting to note that the 
need for a random function (denoted by the 
question mark) was first felt in describing 


the operation of the computer itself. The 
architects of the IBM System/360 wished to 
leave to the discretion of the designers of 
the individual machines of the 360 family 
the decision as to what was to be found in 
certain registers after the occurrence of 
certain errors, and this was done by 
stating that the result was to be random. 
Recognizing more general use for the 
function than the generation of random 
logical vectors, we subsequently defined 
the monadic question mark function as a 
scalar function whose argument specified 
the population from which the random 
elements were to be chosen. 


de IMPLEMENTATION 


In 1964 a number of factors conspired 
to turn our attention seriously to the 
problem of implementation. One was the 
fact that the language was by now 
sufficiently well-defined to give us some 
confidence in its suitability for 
implementation. The second was the 
interest of Mr. John L. Lawrence who, after 
managing the publication of our description 
of System/360, asked for our consultation 
in utilizing the language as a tool in his 
new responsibility (with Science Research 
Associates) for developing the use of 
computers in education. We quickly agreed 
with Mr. Lawrence on the necessity for a 
machine implementation in this work. The 
third was the interest of our then manager, 
Dr. Herbert Hellerman, who, after 
initiating some implementation work which 
did not see completion, himself undertook 
an implementation of an array-based 
language which he reported in the 
Communications of the ACM [17]. Although 
this work was limited in certain important 
respects, it did prove useful as a teaching 
tool and tended to confirm the feasibility 
of implementation. 


Our first step was to define a 
character set for APL. Influenced by Dr. 
Hellerman's interest in time-sharing 
systems, we decided to base the design on 
an 88-character set for the IBM 1050 
terminal, which utilized the __ 
easily-interchanged Selectric® typing 
element. The design of this character-set 
exercised a surprising degree of influence 
on the development of the language. 


As a practical matter it was clear 
that we would have to accept a 
linearization of the language (with no 
superscripts or subscripts) as well as a 
strict limit on the size of the primary 
character set. Although we expected these 
limitations to have a deleterious effect, 
and at first found unpleasant some of the 
linearity forced upon us, we now feel that 
the changes were beneficial, and that many 
led to important generalizations. For 
example: 
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1. On linearizing indexing we 
realized that the sub- and 
super-script form had inhibited the 
use of arrays of rank greater than 2, 
and had also inhibited the use of 
several levels of indexing; both 
inhibitions were relieved by the 
linear form ALI;J;K]. 


2. The linearization of the inner 
and outer product notation (from MxN 
and MxN to M+.xN and Mo.xN) led 
eventually to the recognition of the 
operator (which was now represented 
by an explicit symbol, the period) as 
a separate and important component of 
the language. 


3. Linearization led to a 
regularization of many functions of 
two arguments, (such as Nad for aJ(n) 
and Ax*B for aP) and to the 
redefinition of certain functions of 
two or three arguments so as to 
eliminate one of the arguments. For 
example, ıJ(n) was replaced by wW, 
with the simple expression «¿+10 
replacing the original definition. 
Moreover, the simple form 1N led to 
the recognition that J2.1N could 
replace NaJ (for J a scalar) and that 
Je.2ıN could generalize NaJ in a 
useful manner; as a result the 
functions a and w were eventually 
withdrawn. 


4. The limitation of the character 
set led to a more systematic 
exploitation of the notion of 
ambiguous valence, the representation 
of both a monadic and a dyadic 
function by the same symbol. 


5. The limitation of the character 
set led to the replacement of the two 
functions for the number of rows and 
the number of columns of an array, by 
the single function (denoted by p) 
which gave the dimension vector of 
the array. This provided the 
necessary extension to arrays of 
arbitrary rank, and led to the simple 
expression ppA for the rank of A. 

The resulting notion of the dimension 
vector also led to the definition of 
the dyadic reshape function DpX. 


6. The limitation to 88 primary 
characters led to the important 
notion of composite characters formed 
by striking one of the basic 
characters over another. This scheme 
has provided a supply of easily-read 
and eaSily-written symbols which were 
needed as the language developed 
further. For example, the quad, 
overbar, and circle were included not 
for specific purposes but because 
they could be used to overstrike many 
characters. The overbar by itself 
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also proved valuable for the 
representation of negative numbers, 
and the circle proved convenient in 
carrying out the idea, proposed by 
E.E. McDonnell, of representing the 
entire family of (monadic) circular 
functions by a single dyadic 
function. 


7. The use of multiple fonts had to 
be re-examined, and this led to the 
realization that certain functions 
were defined not in terms of the 
value of the argument alone, but also 
in terms of the form of the name of 
the argument. Such dependence on the 
forms of names was removed. 


We did, however, include 
characters which could print above 
and below alphabetics to provide for 
possible font distinctions. The 
original typing element included both 
the present flat underscore, and a 
saw-tooth one (the pralltriller as 
shown, for example, in Webster's 
Second), and a hyphen. In practice, 
we found the two underscores somewhat 
difficult to distinguish, and the 
hyphen very difficult to distinguish 
from the minus, from which 1t 
differed only in length. We 
therefore made the rather costly 
change of two characters, 
substituting the present delta and 
del (inverted delta) for the 
pralltriller and the hyphen. 


In the placement of the character set 
on the keyboard we were subject to a number 
of constraints imposed by the two forms of 
the IBM 2741 terminal (which differed in 
the encoding from keyboard-position to 
element-position), but were able to devise 
a grouping of symbols which most users find 
easy to learn. One pleasant surprise has 
been the discovery that numbers of people 
who do not use APL have adopted the type 
element for use in mathematical typing. 

The first publication of the character set 
appears to be in Elementary Functions [15]. 


Implementation led to a new class of 
questions, including the formal definition 
of functions, the localization and scope of 
names, and the use of tolerances in 
comparisons and in printing output. It 
also led to systems questions concerning 
the environment and its management, 
including tne matter of libraries and 
certain parameters such as index origin, 
printing precision, and printing width. 


Two early decisions set the tone of 
the implementation work: 1) The 
implementation was to be experimental, with 
primary emphasis on flexibility to permit 
experimentation with language concepts, and 
with questions of execution efficiency 
subordinated, and 2) The language was to be 


compromised as little as possible by 
machine considerations. 


These considerations led Breed and 
P.S. Abrams (both of whom had been 
attracted to our work by Reference 13) to 
Propose and build an interpretive 
implementation in the summer of 1965. This 
was a batch system with punched card input, 
using a multi-character encoding of the 
primitive function symbols. It ran on the 
IBM 7090 machine and we were later able to 
experiment with it interactively, using the 
typeball previously designed, by placing 
the interpreter under an experimental time 
sharing monitor (TSM) available on a 
machine in a nearby IBM facility. 


TSM was available to us for only a 
very short time, and in early 1966 we began 
to consider an implementation on 
System/360, work that started in earnest in 
July and culminated in a running system in 
the fall. The fact that this interpretive 
and experimental implementation also proved 
to be remarkably practical and efficient is 
a tribute to the skill of the implementers, 
recognized in 1973 by the award to the 
principals (L.M. Breed, R.H. Lathwell, and 
R.D. Moore) of ACM's Grace Murray Hopper 
Award. The fact that the many APL 
implementations continue to be largely 
interpretive may be attributed to the array 
character of the language which makes 
possible efficient interpretive execution. 


We chose to treat the occurrence of a 
statement as an order to evaluate it, and 
rejected the notion of an explicit function 
to indicate evaluation. In order to avoid 
the introduction of "names" as a distinct 
object class, we also rejected the notion 
of "call by name". The constraints imposed 
by this decision were eventually removed in 
a simple and general way by the 
introduction of the execute function, which 
served to execute its character string 
argument as an APL expression. The 
evolution of these notions is discussed at 
length in the section on "Execute and 
Format" in The Design of APL (decile 


In earlier discussions with a number 
of colleagues, the introduction of 
declarations into the language was urged 
upon us as a requisite for implementation. 
We resisted this on the general basis of 
simplicity, but also on the basis that 
information in declarations would be 
redundant, or perhaps conflicting, ina 
language in which arrays are primitive. 

The choice of an interpretive 
implementation made the exclusion of 
declarations feasible, and this, coupled 
with the determination to minimize the 
influence of machine considerations such as 
the internal representations of numbers on 
the design of the language, led to an early 
decision to exclude them. 


In providing a mechanism by which a 
user could define a new function, we wished 
to provide six forms in all: functions with 
0, 1, or 2 explicit arguments, and 
functions with 0 or 1 explicit results. 
This led to the adoption of a header for 
the function definition which was, in 
effect, a paradigm for the way in which a 
function was used. For example, a function 
F Of two arguments having an explicit 
result would typically be used in an 
expression such as Z+A F B, and this was 
the form used for the header. 


The names for arguments and results 
in the header were of course made local to 
the function definition, but at the outset 
no thought was given to the localization of 
other names. Fortunately, the design of 
the interpreter made it relatively easy to 
localize the names by adding them to the 
header (separated by semicolons), and this 
was soon done. Names so localized were 
strictly local to the defined function, and 
their scope did not extend to any other 
functions used within it. It was not until 
the spring of 1968 when Breed returned from 
a talk by Professor Alan Perlis on what he 
called "dynamic localization" that the 
present scheme was adopted, in which name 
scopes extend to functions called within a 
function. 


We recognized that the finite limits 
on the representation of numbers imposed by 
an implementation would raise problems 
which might require some compromise in the 
definition of the language, and we tried to 
keep these compromises to a minimum. For 
example, it was clear that we would have to 
provide both integer and floating point 
representations of numbers and, because we 
anticipated use of the system in logical 
design, we wished to provide an efficient 
(one bit per element) representation of 
logical arrays as well. However, at the 
cost of considerable effort and some loss 
of efficiency, both well worthwhile, the 
transitions between representations were 
made to be imperceptible to the user, 
except for secondary effects such as 
storage requirements. 


Problems such as overflow (i.e., a 
result outside the range of the 
representations available) were treated as 
domain errors, the term domain being 
understood as the domain of the machine 
function provided, rather than as the 
domain of the abstract mathematical 
function on which it was based. 


One difficulty we had not anticipated 
was the provision of sensible results for 
the comparison of quantities represented to 
a limited precision. For example, if X and 
Y were specified by Y+2:3 and X+3xY, then 
we wished to have the comparison 2=X yield 
1 (representing true) even though the 
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representation of the quantity X would 
differ slightly from 2. 


This was solved by introducing a 
comparison tolerance (christened fuzz by 
L.M. Breed, who knew of its use in the Bell 
Interpreter [18]) which was multiplied by 
the larger in magnitude of the arguments to 
give a tolerance to be applied in the 
comparison. This tolerance was at first 
fixed (at 1£ 13) and was later made 
specifiable by the user. The matter has 
proven more difficult than we first 
expected, and discussion of it still 
continues [19, 20]. 


A related, but less serious, question 
was what to do with the rational root of a 
negative number, a question which arose 
because the exponent (as in the expression 
~8*2+3) would normally be presented as an 
approximation to a rational. Since we 
wished to make the mathematics behave 
you thought it did in high school" we 
wished to treat such cases properly at 
least for rationals with denominators of 
reasonable size. This was achieved by 
determining the result sign by a continued 
fraction expansion of the right argument 
(but only for negative left arguments) and 
worked for all denominators up to 80 and 
"most" above. 


as 


Most of the mathematical functions 
required were provided by programs taken 
from the work of the late Hirondo Kuki in 
the FORTRAN IV Subroutine Library. Certain 
functions (such as the inverse hyperbolics) 
were, however, not available and were 
developed, during the summers of 1967 and 
1968, by K. M. Brown, then on the faculty 
of Cornell University. 


The fundamental decision concerning 
the systems environment was the adoption of 
the concept of a workspace. As defined in 
"The APL\360 Terminal System" [21]: 


APL\360 is built around the idea of a 
workspace, analogous to a notebook, 
in which one keeps work in progress. 
The workspace holds both defined 
functions and variables (data), and 
it may be stored into and retrieved 
from a library holding many such 
workspaces. When retrieved froma 
library by an appropriate command 
from a terminal, a copy of the stored 
workspace becomes active at that 
terminal, and the functions defined 
in it, together with all the APL 
primitives, become available to the 
user. 


The three commands required for 
managing a library are "save", 
"load", and "drop", which 
respectively store a copy of an 
active workspace into a library, make 
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a copy of a stored workspace active, 
and destroy the library copy of a 
workspace. Each user of the system 
has a private library into which only 
he can store. However, he may load 
a workspace from any of a number of 
common libraries, or if he is privy 
to the necessary information, from 
another user's private library. 
Functions or variables in different 
workspaces can be combined, either 
item by item or all at once, by a 
fourth command, called "copy". By 
means of three cataloging commands, a 
user may get the names of workspaces 
in his own or a common library, or 
get a listing of functions or 
variables in his active workspace. 


The language used to control the 
system functions of loading and storing 
workspaces was not APL, but comprised a set 
of system commands. The first character of 
each system command is a right parenthesis, 
which cannot occur at the left of a valid 
APL expression, and therefore acts as an 
"escape character", freeing the syntax of 
what follows. System commands were used 
for other aspects such as sign-on and 
sign-off, messages to other users, and for 
the setting and sensing of various system 
parameters such as the index origin, the 
printing precision, the print width, and 
the random link usea in generating the 
pseudo-random sequence for the random 
function. 


When it first became necessary to 
name the implementation we chose the 
acronym formed from the book title A 
Programming Language [10] and, to allow a 
clear distinction between the language and 
any particular implementation of it, 
initiated the use of the machine name as 
part of the name of the implementation (as 
in APL\1130 and APL\360). Within the 
design group we had until that time simply 
referred to "the language". 


A brief working manual of the APL\360 
system was first published in November 1966 
[22], and a full manual appeared in 1968 
[23]. The initial implementation (in 
FORTRAN on an IBM 7090) was discussed by 
Abrams [24], and the time-shared 
implementation on System/360 was discussed 
by Breed and Lathwell [25]. 


35 SYSTEMS 


Use of the APL system by others in 
IBM began long before it had been completed 
to the point described in APL\360 User's 
Manual [23]. We quickly learned the = 
difficulties associated with changing the 
specifications of a system already in use, 
and the impact of changes on established 
users and programs. As a result we learned 


to appreciate the importance of the 
relatively long period of development of 
the language which preceded the 
implementation; early implementation of 
languages tends to stifle radical change, 
limiting further development to the 
addition of features and frills. 


On the other hand, we also learned 
the advantages of a running model of the 
language in exposing anomalies and, in 
particular, the advantage of input froma 
large population of users concerned with a 
broad range of applications. This use 
quickly exposed the major deficiencies of 
the system. 


Some of these deficiencies were 
rectified by the generalization of certain 
functions and the addition of others ina 
process of gradual evolution. Examples 
include the extension of the catenation 
function to apply to arrays other than 
vectors and to permit lamination, and the 
addition of a generalized matrix inverse 
function discussed by M.A. Jenkins [26]. 


Other deficiencies were of a systems 
nature, concerning the need to communicate 
between concurrent APL programs (as in our 
description of System/360), to communicate 
with the APL system itself within APL 
rather than by the ad hoc device of system 
commands, to communicate with alien systems 
and devices (as in the use of file 
devices), and the need to define functions 
within the language in terms of their 
representation by APL arrays. These 
matters required more fundamental 
innovations and led to what we have called 
the system phase. 


The most pressing practical need for 
the application of APL systems to 
commercial data processing was the 
provision of file facilities. One of the 
first commercial systems to provide this 
was the File Subsystem reported by Sharp 
[27] in 1970, and defined in a SHARE 
presentation by L.M. Breed [28], and ina 
manual published by Scientific Time Sharing 
Corporation [29]. As its name implies, it 
was not an integral part of the language 
but was, like the system commands, a 
practical ad hoc solution to a pressing 
problem. 


In 1970 R.H. Lathwell proposed what 
was to become the basis of a general 
solution to many systems problems of 
APL\360, a shared variable processor [30] 
which implemented the shared variable 
scheme of communication among processors. 
This work culminated in the APLSV System 
[31] which became generally available in 
1973. 


Falkoff's "Some Implications of 
Shared Variables" [32] presents the 


essential notion of the shared variable 
system as follows: 


A user of early APL systems 
essentially had what appeared to be 
an "APL machine" at his disposal, but 
one which lacked access to the rest 
of the world. In more recent 
systems, such as APLSV and others, 
this isolation is overcome and 
communication with other users and 
the host system is provided for by 
shared variables. 


Two classes of shared variables are 
available in these systems. First, 
there is a general shared variable 
facility with which a user may 
establish arbitrary, temporary, 
interfaces with other users or with 
auxiliary processors. Through the 
latter, communication may be had with 
other elements of the host system, 
such as its file subsystem, or with 
other systems altogether. Second, 
there is a set of system variables 
which define parts of the permanent 
interface between an APL program and 
the underlying processor. These are 
used for interrogating and 
controlling the computing 
environment, such as the origin for 
array indexing or the action to be 
taken upon the occurrence of certain 
exceptional conditions. 


4. A DETAILED EXAMPLE 


At the risk of placing undue emphasis 
on one facet of the language, we will now 
examine in detail the evolution of the 
treatment of numeric constants, in order to 
illustrate how substantial changes were 
commonly arrived at by a sequence of small 
steps. 


Any numeric constant, including a 
constant vector, can be written as an 
expression involving APL primitive 
functions applied to decimal numbers as, 
for example, in 3.14x10*-5 and -2.718 and 
(3.14x10*-5),(-2.718),5. At the outset we 
permitted only non-negative decimal 
constants of the form 2.718, and all other 
values had to be expressed as compound 
statements. 


Use of the monadic negation function 
in producing negative values in vectors was 
particularly cumbersome, as in 
(-4),3,(-5),-7. We soon realized that the 
adoption of a specific "negative" symbol 
would solve the problem, and familiarity 
with Beberman's work [33] led us to the 
adoption of his "high minus" which we had, 
rather fortuitously, included in our 
character set. The constant vector used 
above could now be written as “4,3, 5, 7. 
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Solution of the problem of negative 
numbers emphasized the remaining 
awkwardness of factors of the form 10xNM. 

At a meeting of the principals in Chicago, 
which included Donald Mitchell and Peter 
Calingaert of Science Research Associates, 
it was realized that the introduction of a 
scaled form of constant in the manner used 
in FORTRAN would not complicate the syntax, 
and this was soon adopted. 


These refinements left one function 
in the writing of any vector constant, 
namely, catenation. The straightforward 
execution of an expression for a constant 
vector of N elements involved N-1 
catenations of scalars with vectors of 
increasing length, the handling of roughly 
.5xNxN+1 elements in all. To avoid gross 
inefficiencies in the input of a constant 
vector from the keyboard, catenation was 
therefore given special treatment in the 
original implementation. 


This system had been in use for 
perhaps six months when it occurred to 
Falkoff that since commas were not required 
in the normal representation of a matrix, 
vector constants might do without them as 
well. This seemed outrageously simple, and 
we looked for flaws. Finding none we 
adopted and implemented the idea 
immediately, but it took some time to 
overcome the habit of writing expressions 
such as (3,3)pX instead of 3 3pX. 


de CONCLUSIONS 


Nearly all programming languages are 
rooted in mathematical notation, employing 
such fundamental notions as functions, 
variables, and the decimal (or other radix) 
representation of numbers, and a view of 
programming languages as part of the 
longer-range development of mathematical 
notation can serve to illuminate their 
development. 


Before the advent of the 
general-purpose computer, mathematical 
notation had, in a long and painful 
evolution well-described in Cajori's 
history of mathematical notations [34], 
embraced a number of important notions: 


1. The notion of assigning an 
alphabetic name to a variable or 
unknown quantity (Cajori, Secs. 
339-341). 


2. The notion of a function which 
applies to an argument or arguments 
to produce an explicit result which 
can itself serve as argument to 
another function, and the associated 
adoption of specific symbols (such as 
+ and x) to denote the more common 
functions (Cajori, Secs. 200-233). 
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3. Aggregation or grouping symbols 


(such as the parentheses) which make 
possible the use of composite 
expressions with an unambiguous 
specification of the order in which 
the component functions are'to be 
executed (Cajori, Secs. 342-355). 


4. Simple, uniform representations 
for numeric quantities (Cajori, Secs. 
276-289). 


5. The treatment of quantities 
without concern for the particular 
representation used. 


6. The notion of treating vectors, 
matrices, and higher-dimensional 
arrays as entities, which had by this 
time become fairly widespread in 
mathematics, physics, and 
engineering. 


With the first computer languages 
(machine languages) all of these notions 
were, for good practical reasons, dropped; 
variable names were represented by 
"register numbers", application of a 
function (as in A+B) was necessarily broken 
into a sequence of operations (such as 
"Load register 801 into the Addend 
register, Load register 802 into the Augend 
register, etc."), grouping of operations 
was therefore non-existent, the various 
functions provided were represented by 
numbers rather than by familiar 
mathematical symbols, results depended 
sharply on the particular representation 
used in the machine, and the use of arrays, 
as such, disappeared. 


Some of these limitations were soon 
removed in early "automatic programming" 
languages, and languages such as FORTRAN 
introduced a limited treatment of arrays, 
but many of the original limitations 
remain. For example, in FORTRAN and 
related languages the size of an array is 
not a language concept, the asterisk is 
used instead of any of the familiar 
mathematical symbols for multiplication, 
the power function is represented by two 
occurrences of this symbol rather than by a 
distinct symbol, and concern with 
representation still survives in 
declarations. 


APL has, in its development, remained 
much closer to mathematical notation, 
retaining (or selecting one of) established 
symbols where possible, and employing 
mathematical terminology. Principles of 
Simplicity and uniformity have, however, 
been given precedence, and these have led 
to certain departures from conventional 
mathematical notation as, for example, the 
adoption of a single form (analogous to 
3+4) for dyadic functions, a single form 
(analogous to -4) for monadic functions, 


and the adoption of a uniform rule ror the 
application of all scalar functions to 
arrays. This relationship to mathematical 
notation has been discussed in The Design 
of APL [1] and in "Algebra as a Language" 
which occurs as Appendix A in Algebra: an 
algorithmic treatment [35]. 


The close ties with mathematical 
notation are evident in such things as the 
reduction operator (a generalization of 
sigma notation), the inner product (a 


generalization of matrix product), and the 
outer product (a generalization of the 
outer product used in tensor analysis). In 


other aspects the relation to mathematical 
notation is closer than might appear. For 
example, the order of execution of the 
conventional expression F G H (X) can be 
expressed by saying that the right argument 
of each function is the value of the entire 
expression to its right; this rule, 
extended to dyadic as well as monadic 
functions, is the rule used in APL. 
Moreover, the term operator is used in the 
Same sense as in “derivative operator" or 
"convolution operator" in mathematics, and 
to avoid conflict it is not used as a 
synonym for function. 


As a corollary we may remark that the 
other major programming languages, although 
known to the designers of APL, exerted 
little or no influence, because of their 
radical departures from the line of 
development of mathematical notation which 
APL continued. A concise view of the 
current use of the language, together with 
comments on matters such as writing style, 
may be found in Falkoff's review of the 
1975 and 1976 International APL Congresses 
[36.14 


Although this is not the place to 
discuss the future, it should be remarked 
that the evolution of APL is far from 
finished. In particular, there remain 
large areas of mathematics, such as set 
theory and vector calculus, which can 
clearly be incorporated in APL through the 
introduction of further operators. 


There are also a number of important 
features which are already in the abstract 
language, in the sense that their 
incorporation requires little or no new 
definition, but are as yet absent from most 
implementations. Examples include complex 
numbers, the possibility of defining 
functions of ambiguous valence (already 
incorporated in at least two systems 
[37, 38]), the use of user defined 
functions in conjunction with operators, 
and the use of selection functions other 
than indexing to the left of the assignment 
arrow. 


We conclude with some general 
comments, taken from The Design of APL [1], 


on principles which guided, and 
circumstances which shaped, the evolution 
of APL: 


The actual operative principles 
guiding the design of any complex 
system must be few and broad. In the 
present instance we believe these 
principles to be simplicity and 
practicality. Simplicity enters in 
four guises: uniformity (rules are 
few and simple), generality (a small 
number of general functions provide 
aS Special cases a host of more 
specialized functions), familiarity 
(familiar symbols and usages are 
adopted whenever possible), and 
brevity (economy of expression is 
sought). Practicality is manifested 
in two respects: concern with actual 
application of the language, and 
concern with the practical 
limitations imposed by existing 
equipment. 


We believe that the design of APL was 
also affected in important respects 
by a number of procedures and 
circumstances. Firstly, from its 
inception APL has been developed by 
uSing it in a succession of areas. 
This emphasis on application clearly 
favors practicality and simplicity. 
The treatment of many different areas 
fostered generalization: for 
example, the general inner product 
was developed in attempting to obtain 
the advantages of ordinary matrix 
algebra in the treatment of symbolic 
logic. 


Secondly, the lack of any machine 
realization of the language during 
the first seven or eight years of its 
development allowed the designers the 
freedom to make radical changes, a 
freedom not normally enjoyed by 
designers who must observe the needs 
of a large working population 
dependent on the language for their 
daily computing needs. This 
circumstance was due more to the 
dearth of interest in the language 
than to foresight. 


Thirdly, at every stage the design of 
the language was controlled by a 
small group of not more than five 
people. In particular, the men who 
designed (and coded) the 
implementation were part of the 
language design group, and all 
members of the design group were 
involved in broad decisions affecting 
the implementation. On the other 
hand, many ideas were received and 
accepted from people outside the 


The Evolution of APL 


71 


design group, particularly from 
active users of some implementation 
of APL. 


Finally, design decisions were made 
by Quaker consensus; controversial 
innovations were deferred until they 
could be revised or reevaluated so as 
to obtain unanimous agreement. 
Unanimity was not achieved without 
cost in time and effort, and many 
divergent paths were explored and 
assessed. For example, many 
different notations for the circular 
and hyperbolic functions were 
entertained over a period of more 
than a year before the present scheme 
was proposed, whereupon it was 
quickly adopted. As the language 
grows, more effort is needed to 
explore the ramifications of any 
major innovation. Moreover, greater 
care is needed in introducing new 
facilities, to avoid the possibility 
of later retraction that would 
inconvenience thousands of users. 
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APL LANGUAGE SUMMARY 


APL is a general-purpose programming language with the 
following characteristics (reprinted from APL Language [4]): 


The primitive objects of the language are arrays (lists, 
tables, lists of tables, etc.). For example, A+B is 
meaningful for any arrays A and B, the size of an array 
(PA) is a primitive function, and arrays may be indexed 
by arrays as in A[3 1 4 2], 


The syntax is simple: there are only three statement 
types (name assignment, branch, or neither), there is no 
function precedence hierarchy, functions have either one, 
two, or no arguments, and primitive functions and defined 
functions (programs) are treated alike. 


The semantic rules are few: the definitions of primitve 
functions are independent of the representations of data 
to which they apply, all scalar functions are extended to 
other arrays in the same way (that is, item-by-item), and 
primitive functions have no hidden effects (so-called 
side-effects). 


The sequence control is simple: one statement type 
embraces all types of branches (conditional, 
unconditional, computed, etc.), and the termination of 
the execution of any function always returns control to 
the point of use. 


External communication is established by means of 
variables which are shared between APL and other systems 
or subsystems. These shared variables are treated both 
syntactically and semantically like other variables. A 
subclass of shared variables, system variables, provides 
convenient communication between APL programs and their 
environment. 


The utility of the primitive functions is vastly enhanced 
by operators which modify their behavior in a systematic 
manner. For example, reduction (denoted by /) modifies a 
function to apply over all elements of a list, as in +/Ll 
for summation of the items of Z. The remaining operators 
are scan (running totals, running maxima, etc.), the axis 
operator which, for example, allows reduction and scan to 
be applied over a specified axis (rows or columns) of a 
table, the outer product, which produces tables of values 
as in RATES FYDARS for an interest table, and the inner 
product, a simple generalization of matrix product which 
is exceedingly useful in data processing and other 
non-mathematical applications. 


The number of primitive functions is small enough that 
each is represented by a Single easily-read and 
easily-written symbol, yet the set of primitives embraces 
Operations from simple addition to grading (sorting) and 


formatting. The complete set can be classified as 
follows: 
Arithmetic: t -x *+*@®oj|f]tlft! B 
Boolean and Relational: YV AMA ~< S =2># 
Selection and Structural: / \ Zx [;l]t+ p, 069.0 


General: € 17.4TAYVe#F# 
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6 Programming Style in APL 


PROGRAMMING STYLE IN APL 


Kenneth E. Iverson 
IBM Thomas J. Watson Research Center 
Yorktown Heights, New York 


When all the techniques of program management and programming practice have been applied, there 
remain vast differences in quality of code produced by different programmers. These differences turn 
not so much upon the use of specific tricks or techniques as upon a general manner of expression, 
which, by analogy with natural language, we will refer to as style. This paper addresses the question 
of developing good programming style in APL. 


Because it does not rest upon specific techniques, good style cannot be taught in a direct manner, but 
it can be fostered by the acquisition of certain habits of thought. The following sections should 
therefore be read more as examples of general habits to be identified and fostered, than as specific 
prescriptions of good technique. 


In programming, as in the use of natural languages, questions of style depend upon the purpose of 
the writing. In the present paper, emphasis is placed upon clarity of expression rather than upon 
efficiency in space and time in execution. However, clarity is often a major contributor to efficiency, 
either directly, in providing a fuller understanding of the problem and leading to the choice of a better, 
more flexible, and more easily documented solution, or indirectly, by providing a clear and complete 
model which may then be adapted (perhaps by programmers other than the original designer) to the 
characteristics of any particular implementation of APL. 


All examples are expressed in O-origin. Examples chosen from fields unfamiliar to any reader should 


perhaps be skimmed lightly on first reading. 


1. Assimilation of Primitives and Phrases 


Knowledge of the bare definition of a primitive can permit its use in situations where its applicability 
is clearly recognizable. Effective use, however, must rest upon a more intimate knowledge, a feeling 
of familiarity, an ability to view it from different vantage points, and an ability to recognize similar 
uses in seemingly dissimilar applications. 


One technique for developing intimate knowledge of a primitive or a phrase is to create at least one 
clear and general example of its use, an example which can be retained as a graphic picture of its 
behavior when attempting to apply it in complex situations. We will now give examples of creating 


such pictures for three important cases, the outer product, the inner product, and the dyadic transpose. 


Outer product. The formal definition of the result of the expression R«Ao .fB for a specified primitive 
f and arrays A and B of ranks 3 and 4 respectively, may be expressed as: 


R{H:I;J;K;L:M;N]<>A[H:;I;J] f BLK;L;M;N] 
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Although this definition is essentially complete, it may not be very helpful to the beginner in forming 
a manageable picture of the outer product. 


To this end it might be better to begin with the examples: 


No.+N+1 2 3 4 No.xN 
2 345 u. O y 
3456 2 4 6 8 
49.67 oS © AZ 
5.08 4 8 12 16 


and emphasize the fact that these outer products are the familiar addition and multiplication tables, 
and that, more generally, 4o.fB yields a function table for the function f applied to the sets of 
arguments A and B. 


One might reinforce the idea by examples in which the outer product illuminates the definition, 
properties, or applicability of the functions involved. For example, the expressions 
So.xS+ 3 2 10 1 2 3, and xS0.xS yield an interesting picture of the rule of signs in multipli- 
cation, and the expressions Ro.=V and Ro.<V and * *'[Ro.=V] (with V<(X-3)x(X<1+17)-5 and 
with R specified as the range of V, that is, Re8 76 5 4 3 2 1 0 1) illustrate the applicability 
of outer products in defining and producing graphs and bar charts. These and other uses of outer 
products as function tables are treated in Iverson [1]. 


Useful pictures of outer products of higher rank may also be formed. For example, 
Do.VDo.VD*«0 1 gives a rank three function table for the or function with three arguments, and if 
A is a matrix of altitudes of points in a rectangular area of land and C is a vector of contour levels 
to be indicated on a map of the area, then the expression Co.<A relates the points to the contour 
levels and +/Co.<A gives the index of the contour level appropriate to each point. 


Inner Product. Although the inner product is perhaps most used with at least one argument of rank 
two or more, a picture of its behavior and wide applicability is perhaps best obtained (in the manner 
employed in Chapter 13 of Reference 1) by first exploring its significance when applied to vector 
arguments. For example: 


+/PxQ Total cost in terms of price and quantity. 
21 

L/P+Q Minimum trip of two legs with distances to and from 
3 connecting point given by P and Q. 

x/PxQ The number whose prime factorization is specified by 
700 the exponents Q. 

+/PxQ Torque due to welghts Q placed at positions P 
21 relative to the axis. 


The first and last examples above illustrate the fact that the same expression may be given different 
interpretations in different fields of application. 


The inner product is defined in terms of expressions of the form used above. Thus, P+.xQ <> +/PxQ 
and, more generally for any pair of scalar functions f and g, Pf.gQ <> f/PgQ. The extension to arrays 
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of higher rank is made in terms of the definition for vectors; each element of the result is the inner 
product of a pair of vectors from the two arguments. For the case of matrix arguments, this can be 
represented by the following picture: 


The +.x inner product applied to two vectors V and W (as in V+.xW) can be construed as a 
weighted sum of the vector V, whose elements are each “weighted” by multiplication by the cor- 
responding elements of W, and then summed. This notion can be extended to give a useful interpretation 
of the expression M+.xW, for a matrix M, as a weighted sum of the column vectors of M. Thus: 


W+3 1 4 
LKMT 3p 19 


ua Fre 
o n y 
ODO 


M+.xW 
17. Elo 65 


This result can be seen to be equivalent to writing the elements of W below the columns of M, 
multiplying each column vector of M by the element below it, and adding. 


If W is replaced by a boolean vector B (whose elements are zeros or ones), then M+.xB can still be 
construed as a weighted sum, but can also be construed as sums over subsets of the rows of M, the 
subsets being chosen by the 1’s in the boolean vector. For example: 


B+1 0 1 

M+.xB 
4 10 6 

B/M 


“If he 
O MD W 


+/B/M 
4 10 16 


Finally, by using an expression of the form Mx.xB instead of M+.xB, a boolean vector can be used 
to apply multiplication over a specified subset of each of the rows of M. Thus: 


Mx .*B 
3: 26. ¿63 

x/B/M 
3. 24: 63 


This use of boolean vectors to apply functions over specified subsets of arrays will be pursued further 
in the section on generalization, using boolean matrices as well as vectors. 
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Dyadic transpose. Although the transposition of a matrix is easy to picture (as an interchange of 
rows and columns), the dyadic transpose of an array of higher rank is not, as may be seen by trying 
to compare the following arrays: 


A 2 1 304 3 2 194 

ABCD ABCD AM 

EEGH MNOP EQ 

IJKL LU: 
EFGH 

MNOP QRST BN 

QRST ER 

UVWX IJKL JV 
UVWX 

CO. 

GS 

KW 

DP 

HY 

LX 


The difficulty increases when we permit left arguments with repeated elements which produce “diago- 
nal sections” of the original array. This general transpose is, however, a very useful function and worth 
considerable effort to assimilate. The following example of its use may help. 


The associativity of a function f is normally expressed by the identity: 


Xf (YfZ )+> CXFY )f 
Z 


and a test of the associativity of the function on some specified domain D+1 2 3 can be made by 
comparing the two function tables Do .f(Do.fD) and (Do.f 
D)o.fD corresponding to the left and right sides of the identity. For example: 


D+1 2 3 
D+L+Do.-(Do.-D) [KR+(Do.-D)o.-D LER 
1 2 3 he oe ES 000 
oe Ge 2 io A 000 
1. 0 i 23 4 5 0.0.0 
2 32 4 ¿0 a 2 000 
1 DO 3 1 2 3 OOO 
O 2: 2 e 0:30.80 
3 4 5 SOs 000 


NO 
GO 
= 
© 
H 
NO 
O 
O 
O 
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Af DER 


0 
DeL <Do .+(Do .+D) [KR+(Do.+D)0.4+D L=R 

3 Ab 5 35 tt 
4 e B56 E, 1 
5:6: 7 Bo a pr 
Le 5:6 4-56 a 
56 7 3: 20.7 111 
Gr T iB 67 8 TA 
5 6 7 56 7 Pde a 
6-7 8 6 7 8 1 1 1 
TOS 789 LLLA 


For the case of logical functions, the test made by comparing the function tables can be made complete, 
since the functions are defined on a finite domain D+0 1. For example: 


D-0 1 

A/,(Do.v(Do.vD))=((Do.VD)o.VD) 
1 

Ad beck De. ED EDS De 2D) 
1 

A/ (Do. .A(Do.AD))=((Do.RAD)o AD) 
0 


Turning to the identity for the distribution of one function over another we have expressions such 
as: 


Xx (Y+Z )<> (XxY )+ (XxZ) 
and 
XA(YVWZ) o (XAY)V(XAZ) 


Attempting to write the function table comparison for the latter case as: 


I<Do.A(Do.VD) 
Re(Do.AD)o.V(Do.AD) 


we encounter a difficulty since the two sides L and R do not even agree in rank, being of ranks 3 
and 4. 


The difficulty clearly arises from the fact that the axes of the left and right function tables must agree 
according to the names in the original identity; in particular, the X in position O on the left and in 
positions 0 and 2 on the right implies that axes 0 and 2 on the right must be “run together” to form 

a single axis in position 0. The complete disposition of the four axes on the right can be seen to be 
described by the vector 0 1 0 2, showing where in the result each of the original axes is to appear. 
This is a paraphrase of the definition of the dyadic transpose, and we can therefore compare L with 
0 1 0 2QR. Thus: 


A/,(Do.A(Do.VD))=0 1 O 2Q9((Do.AD)o.V(Do.AD)) 
1 


The idea of thorough assimilation discussed thus far in terms of primitive expressions can be applied 
equally to commonly used phrases and defined functions. For example: 
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pV The indices of vector V 


ippA | The axes of A 

x/pA The number of elements in 4 

VLAV ] Sorting the vector V 

M[A+4R<.-QReM,O;] Sorting the rows of M into lexical order 

QFQU Applying to columns a function F defined on rows 


Collections of commonly used phrases and functions may be found in Perlis and Rugaber [2] and 


in Macklin [3]. 


2. Function Definition 


A complex system should best be designed not as a single monolithic function, but as a structure built 
from component functions which are meaningful in themselves and which may in turn be realized 
from simpler components. In order to interact with other elements of a system, and therefore serve 
as a “building block”, a component must possess inputs and outputs. A defined function with an 
explicit argument, or arguments, and an explicit result provides such a component. 


If a component function produces side effects by setting global variables used by other components, 
the interaction between components becomes much more difficult to analyze and comprehend than if 
communication between components is limited to their explicit arguments and explicit results. Ideally, 
systems should be designed with communication so constrained and, in practice, the number of global 
variables employed should be severely limited. 


Because the fundamental definition form in APL (produced by the use of V or by OFX, and commonly 
called the del form) is necessarily general, it permits the definition of functions which produce side 
effects, which have no explicit arguments, and which have no explicit results. The direct form which 
uses the symbols a and w (as defined in Iverson [4] ) exercises a discipline more appropriate to good 
design, allowing only the definition of functions with explicit results, and localizing all names which 
are specified within the function, thereby eliminating side effects outside of it. 


The direct form of definition may be either simple or conditional. The latter form will be discussed 
in section 6. The simple form may be illustrated as follows. The expression 


F:w+t4ta 


may be read as “F is defined by the expression w+4:a, where a represents the first argument of F 
and w represents the second”. Thus 8 F 3 yields 3.5. 


If a direct definition is to produce a machine executable function, the definition must be translated by a suitable 
function. For example, if this translation is called DEF, then: 


DEF DEF 
F:attw SORT : wl Aw ] 
F SORT 

Se DORT 3 sb: SO E 
325 t 23 3 SE 67 

DEF DEF 
P:+/axwrıpa POL: (wo.*1pa)+.xa 
P POE 

LO Sary 1:28: 3: 1. 202.0: 1:2 ee 
125 t- G 2) 08 125 
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The direct form of definition will be used in the examples which follow. The question of the 
translation function DEF is discussed in Appendix A. 


3. Generality 


It is often possible to take a function defined for a specific purpose and modify it so that it applies 
to a wider class of problems. For example, the function AV: (+/w)*pw may be applied to a numeric 
vector to produce its average. However, it fails to apply to average all rows in a matrix; the simple 
modification AV2:(+/w)+ 1tpw not only permits this, but applies to average the vectors along the 
last axis of any array, including the case of a vector. 


The problem might also be generalized to a weighted average, in which a vector left argument specifies 
the weights to be applied in summation, the result being normalized by division by the total weight. 
Again this function could be defined to apply to a vector right argument in the form 
WAV: (+/axw)++/a, but, applying the inner product in the manner discussed in the preceding section, 
we may define a function which applies to matrices: 


WI Cut eee 


Thus: 
[KM<3 49112 
DO u 2 8 
te oe fos of 
8 9 10 11 
W<2 13 4 
W WAV2 M 
1.29.5459: 8289 


The same function may be interpreted in different ways in different disciplines. For example, if column 
I of M gives the coordinates of a mass of weight W[I], then W WAV2 M is the center of gravity of the 
set of masses. Moreover, if the elements of W are required to be non-negative, then the result 
W WAV2 M is always a point in the convex space defined by the points of M, that is a point within 
the body whose vertices are given by M. This can be more easily seen in the following equivalent 
function: 


WAV3:wt+.x(wit/a) 
in which the weights are normalized to sum to 1. 
Striving to write functions in a general way not only leads to functions with wider applicability, but 
often provides greater insight into the problem. We will attempt to illustrate this in three areas, 
functions on subsets, indexing, and polynomials. 
Functions on subsets. It is often necessary to apply some function (such as addition or maximum) 
over all elements in some subset of a given list. For example, to sum all non-negative elements in 


the list #3 74 2 0 73 7, we might first define the boolean vector which identifies the desired subset, 
then select the set, and then sum it: 


X20 (X>20)/X +/(X20y/X 
10:52 220: 2 12 
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In general, if B is a boolean vector which defines a subset, we may write +/B/X. However, as seen 
in the discussion of inner product, this may also be written in the form X+.xB, and in this form it 
applies more generally to a boolean matrix (or higher rank array) in which the columns (or vectors 
along the leading axis) determine the different subsets. For example, if 


LkB<+(4p2)T12*4 
0:9 20: DEE EEE LE 
00-0 U EEE O ODO A a 1 
0.0: 20 ESS a 
O 2:20:50 80385 0 Shi kOe 2 


then the columns of B represent all possible subsets of a vector of four elements, and if ¥+2 3 5 7 
then: 


X+.xB 
Oe Oe SEL Bir Ir A 10 13 


yields the sums over all subsets of X, including the empty set (0 0 0 0), and the complete set 
(Po ei E 


It is also easy to establish that 


Xx.*B 
dS O OS ME 410.70 8 33295910 


yields the products over all subsets, and that (for non-negative vectors X) the expression 


XT .xB 
DTS TS SDE SAT SAS 


yields the maxima over all subsets of X. This last expression holds only for non-negative values of 
X, but could be replaced by the more general expression M+(X-Mel /X)[.xB. A more general approach 
to this problem (in terms of a new operator) is discussed in Section 2 of Iverson [5]. 

If we have a list A with repeated elements, and if we need to evaluate some costly function F on each 


element of A, then it may be efficient to evaluate F only on the nub of A (consisting of the distinct 
elements of A) and then distribute the results to the appropriate positions to yield FA. Thus: 


Function Definition Example 


ARS OS 


Nub NUB Ctp uro fw NUB A 
5 2.5 

Distribution DES MOD Em DIS A 
1.30% a 20 O 4 
OF 4: 0) 40: 50 
CO te 80: 40 

Example F:wx*2 FA 
949 25 4 9 

F NUB A 

9 4 25 


(F NUB A)+.xDIS A 
9 4 9 2549 
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From the foregoing it may be seen that an inner product post-multiplication by the distribution matrix 
DIS A distributes the results F NUB A appropriately. The distribution function may also be used to 
perform aggregation or summarization. For example, if C is a vector of costs associated with the 
account numbers recorded in A, then summarization of the costs for each account may be obtained 
by pre-multiplication by DIS A. Thus: 


CSL 3. O26 
(DIE Aya xe 
10 7 4 


Indexing. If M is a matrix of N+ 1+pM columns, and if 7 and J are scalars, then element MLI;J] 
can be selected from the ravel R+,M by the expression R[(Nx1)+J]. More generally, if K is a 
two-rowed matrix whose columns are indices to elements of M, then these elements may be selected 
much more easily from R (by the expression RL (NxK[0;])+K[1;]]) than from M itself. Moreover, 
the indexing expression can be simplified to RL(N,1)+.xX], or to RE (pM)1X]. 


The last form is interesting in that it applies to an array M of any rank P, provided that K has P 
rows. More generally, it applies to an index array K of any rank (provided that (ppM)=14pX) to 
produce a result of shape 1+pX. To summarize, we may define a general indexing function: 


SUB:(,a)[(pa)iw] 


and use it as in the following examples: 


[KM+3 3919 Deke312 50110 
Ot 2 0102509: T 
A? i ie 2 0 1.270 
6 7 8 
M SUB K 
O 
M SUB 3|2 3 5p130 
O48 
80480 
4804 8 
(4 4 4oit*3) SUB 413 2 602x136 


O42 042 012 
O 42 042 042 


This use of the base value function in the expression (pa)Lw correctly suggests the possible use of 


the inverse expression (pa)Tw to obtain the indices to an array a in terms of the index to its ravel 
(that 1s, w). 


Polynomials. If F:+/axwxıpa, then the expression C F X evaluates the polynomial with coefficients 
C for the scalar argument X. The more general function: 


P:(wo.xıpa)+.xa 
applies to a vector right argument and (since wo .*1pa is then a matrix M, and since M+. xa is a linear 
function of a) emphasizes the fact that the polynomial is a linear function of its coefficients. If 
(pw)=pa, then M is square, and if the elements of w are all distinct (that is, (pw)=pNUB)), then 


MIN is non-singular, and the function: 


FIT: (Ewo.*ıpa)+t.xa 


Programming Style in APL 87 


iS Inverse to P in the sense that: 
Ce TO EX) ET EX and Y s> AY FIT Ke X 


In other words, if Y+F X for some scalar function F, then Y FIT X yields the coefficients of the 
polynomial which fits the function F at the points (arguments) X. For example: 


IFYEXXEO .5 11.5 
LODO" 1.649: 2,713°71.1522 
StC<+Y FIT X 
1.000 1.059 .296 .364 
37C P X 
12000 1.619 2.718 1 482 


The function F can be defined in a neater equivalent form, using the dyadic form of E as 
FIT:atiwo.*1pa. Moreover, the more general function: 
LSF:azwo.xıN 


(which depends upon the global variable N) yields the W coefficients of the polynomial of order 
N-1 which best fits the function a+Fw in the least squares sense. Thus: 


Vey 
Sey LSF X 
1.000 1.059 .296 .364 
N+3 N<+2 
SET LoF X IFCEYLSF X 
1,015. 2631 15145 735 2,83083 
IFC PX 3°C P X 
1.014 1.608 2.759 4.468 e739: T886 3.038 4r T89 


The case N+2 yields the best straight line fit. It can be used, for example, in estimating the “compound 
interest” or “growth rate” of a function that is assumed to be approximately exponential. This is done 
by fitting the logarithm of the values and then taking the exponential of the result. For example: 


0 1234 5 
3FY<300xK1.09%*X 
3003000. 3272000 356.430.388.509 193.470 461.597 


N+2 
37E+(8Y) LSF X 
5.704 .086 
xE 
300 1.09 


37 (*ELO])x(*E[1])*X 

300000 327..000 256.430 888.509 403 ETE 461.587 
3PY<Y+?6o[JRL+50 

300.000 355.000 395.430 434.509 454.474 508.587 
37E+-(®8Y) LSF X 

5.749 .099 
* By 

313. 75949074 1210483686898 
SP(*ELO])x(*EL1))*X 

Toe 109 316.506: 3825672 409.609 466:717: 515497 
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The growth rate is *xE[1], and the estimated compound interest rate is therefore given by the function 
ECI:100x 1+*14(@a) LSF w 
For example: 


17I+«Y ECI X 
LO 
IP(AEL[OJ)x(1+.01x1)xX 
313,753 (8462506 382,671 122,509 41665717-5315,2327 


General considerations can often lead to simple solutions of specific problems. Consider, for example, 
the definition of a “times” function 7 for the multiplication of polynomials, that is: 


(CP UK E A) ee CCL DI PX 


The function T is easily shown to be linear in both its left and right arguments, and can therefore 
be expressed in the form C+.xB+.xD. The array B is a boolean array whose unit elements serve to 
multiply together appropriate elements of C and D, and whose zeros suppress contributions from other 
pairs of elements. The elements of B are determined by the exponents associated with C, with D, and 
with the result vector, that is, ıpC and 19D and ıp1YC,D. For each element of the result, the 
“deficiency” of each element of the exponents associated with D is given by the table 
S«(1914C,D)°.-1pD, and the array B is obtained by comparing this deficiency with the contributions 
from the exponents associated with C, that is, (1pC)o.=S. To summarize, the times function may 
be defined as follows: 


Trat.x(aBw)+.xKw 
B:(1pa)o.=(ipita,w)o.-1pw 


For example: 


Lebo Cel. 2 1) i DEL er 3A) 
Ll 0 O: AO? 23; 


Since the expression a+.x(aBw) yields a matrix, it appears that the inverse problem of defining a 
function DB (divided by) for polynomial division might be solved by inverting this matrix. To this 
end we define a related function BQ expressed in terms of E and C, rather than in terms of C and 
D: 


BQ: (1pa)o.=(1pw)°o.-11+(pw)-pa 
and consider the matrix Me*C+.xC BQ E. 
The expression (EM)+.xE fails to work properly because M is not square, and we recognize two cases, 
the first being given by inverting the top part of M (that is, Ei(2pL/pM)t+™M) and yielding a quotient 
with high-order remainder, and the second by inverting the bottom part and yielding a quotient with 
low order remainder. Thus: 

DBHO: (Dta)El(2pD* L/pM) tMewt .xwBQa 

DBLO: (Dta)E\(2pD+-L/ pM) +Mewt .xwBQa 
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For example: 


E£+1 5 10 10 7 4 E+47101051 
CHL- 2 4 CEL 
[QE DBHO C [Qe OBLO C 
1.8.8 NE S 
OFTR+E-C T Q OFREE=E TQ 
o O “3 $ A 23 922270 0:00:10 


The treatment of polynomials is a prolific source of examples of the insights provided by precise 
general functions for various processes, insights which often lead to better ways of carrying out 
commonly-needed hand calculations. For example, a function E for the expansion of a polynomial C 
(defined more precisely by the relation (E C)P X «+ C P X+1) can be defined as: 


E:(BC pw)+.Xw BOE VG) asl VO 


Working out an example shows that manual expansion of C can be carried out be jotting down the 
table of binomial coefficients of order pC (that is, BC pw) and then taking a weighted sum of its 
columns, the weights being the elements of C. 


4. Identities 


An identity is an equivalence between two different expressions. Although identities are commonly 
thought of only as tools of mathematical analysis, they can be an important practical tool for simplfying 
and otherwise modifying expressions used in defining functions. 


Consider, for example, a function F which applied to a boolean vector suppresses all 1’s after the first. 
It could be used, for example, in the expression (~F X='D')/X to suppress the first D in a character 
string X. The function could be defined as F: (wı1)=ıpw. However, the following identity holds: 


(wu1)=1puS<lw 
and we may therefore use one or other of the equivalent functions: 
FS(wL)=1pw G:<\w 


One may react to a putative identity in several ways: accept it on faith and use it as a practical tool, 
work some examples to gain confidence and a feeling for why it works, or prove its validity in a general 
way. The last two take more time, but often lead to further insights and further identities. Thus the 
application of the functions F and G to a few examples might lead one to see that G applies in a 
straightforward way to the rows of a matrix, but F does not, that both can be applied to locate the 
first zero by the expressions ~F~B and ~G~B, and (perhaps) that the latter case (that is, ~<\~B) 
can be replaced by the simpler expression <\B. 


As a second example, consider the expression Y+((-B)/X),B/X with B+X<2. The result is to classify 
the elements of X by placing all those in a specified class (those less than or equal to 2) at the tail 
end of Y. More generally, we may define a classification function C which classifies the elements of 
its right argument according to its boolean left argument: 


C:((~a)/w),a/w 
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For example: 


KOS 1.472 

[kK B<X<2 
01.00 1 

BCX 
ET 2 


Since the result of C is a permutation of its right argument, it should be possible to define an equivalent 
function in the form w[V], where V is some permutation vector. It can be shown that the appropriate 
permutation vector is simply Aa. For example: 


AB XL AB] 
02314 34.712 


Thus: 
P:w[ Aa] and C:((ra)/w),a/w 
are equivalent functions. 


For any given function there are often related functions (such as an inverse) of practical interest. For 
example, if V+B C X, then there is some inverse function CI such that B CI V yields X. Moreover, 
the definition of a related function may be much easier to derive from one of several different 
equivalent definitions of the original function than from the others. Thus the definition of the inverse 
CI may not be immediately evident from the definition C, but from the definition P it is clear that 
what is needed is the inverse permutation. Thus: 


CI:wlhha ] 
[KV+B C X BUILT y 
S EZ So At Bey 2 


Finally, a given formulation of a function may suggest a simple formulation for a similar function. 
For example, the application of the function P with a left argument containing a single 1 can be seen 
to effect a rotation of that suffix of the right argument marked off by the location of the 1. This 
suggests the following formulation for a function which rotates each of the segments marked off by 
the 1’s in the left argument: 


RS:w[ Aa+t+\a] 


1: 9.0: 00.04 0 25 YABCDEFGAT! 
BCAEFGDIH 


Dualities. We will now consider one class of very useful identities in some detail. The most familiar 
example of the class is known as deMorgan’s law and is expressed as follows: 


XAY > r(rX)vV(rY) 
Useful related forms of deMorgan’s law are: 
ALU A EN LAV 


AV <> ~v\cV 
MV. AN <> (=M)A.V(=N) 


Programming Style in APL 91 


DeMorgan’s law concerns a relation between the functions and, or, and not (A v ~), and we say that 
A is the dual of v with respect to ~. Each of the boolean functions of two arguments possess a dual 
with respect to ~. For example, X<Y > ~(~X)<(~Y), and from this the three related identities 
</V => ~</~V, etc.) follow in the manner shown above. The five dual pairs of boolean functions 
are: 


V af < = > 
A N < z > 


These dualities are frequently useful in simplifying expressions used in logical selections. For example, 
we have already seen the use of the duality between < and < to replace the expression ~<\~w by 
<\w. 


Useful dualities are not limited to boolean functions. For example, maximum and minimum ([ and 
L) are dual with respect to arıthmetic negation (-) as follows: 


EEE 
Again the related forms of duality follow. 


More generally, duality is defined in terms of any monadic function M and its inverse MI as follows: 
a function F is said to be the dual of a function G with respect to M if: 


X F Y <> MI (M X)G(M Y) 


In the preceding examples of duality, each of the monadic functions used (~ and -) happened to be 
self-inverse and MI was therefore indistinguishable from M. 


The general form includes the duality with respect to the natural logarithm function ® which lies at 
the root of the use of logarithm tables and addition to perform multiplication, namely: 


x/X <> x+/0X 


The use of base ten logarithms rests similarly on duality with respect to the monadic function 
1084 and its inverse 10*uw: 


x/X <> 10x+/1080X 


5. Proofs 


A proof is a demonstration of the validity of an identity based upon other identities or facts already 
proven or accepted. For example, deMorgan's law may be proved by simply evaluating the two 
supposedly equivalent expressions (XAY and ~(~X)v(~Y)) for all possible combinations of boolean 
values of X and Y: 


X Y MN ~X ~Y (~X)y(~Y) ~(~X)y (~Y) 
0 0 O 1 1 1 O 
O- q 0 1 0 il 0 
2. Y O 0 1 1 O 
to a 1 0 0 0 l 


An identity which is useful and important enough to be used in the proofs of other identities is 
commonly called a theorem. Thus: 
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Theorem 1 (AxB)o.x(PxQ) <> (Ao.xP)x(Bo.xQ) 


We will prove theorem 1 itself for vectors 4,B,P, and Q by calling the results of the left and right 
expressions L and R and showing that for any indices I and J, the values of L[I;J] and RLI;J] 
agree. We do this by writing a sequence of equivalent expressions, citing at the right of each expression 
the basis for believing it to be equivalent to the preceding one. Thus: 


LUI 3d] 

((AxB)o.x(PxQ) )[I37] Def of Z 

(AxB) (I ]x(PxQ) £7] Def of o.x 

(ACI JxBLI])x(PL7]xQ@[J7]) Def of vector x 

ioe ae | 

((Ao.xP)x(Bo.xQ) )[I3J7] Def of R 
(Ao.xP)[I3J]x(Bo.xQ@)[I37] Def of matrix x 

(ALI JxPLJ])x(BEI]xQ@[7]) Def of o.x 

(ALT ]xB[I])x(PLJ]xQEJ7]) x associates and commutes 


Comparison of the expressions ending the two sequences completes the proof. 


We will now state a second theorem (whose proof for vector variables is given in Iverson [6] ), and 
use it in a proof that the product of two polynomials C P X and D P X is equivalent to the expression 
+/,(Co.xD)xX*(1pC)o.+1pD: 


Theorem 2 +/,Vo.xW <> (+/V)x(+/W) 

Thus: 

Theorem 3 (C P X)x(D P X) 
(+/CxX*E+10C)x(+/DxX¥*F+1pD) Def of P 
+/,(CxXx*E)o.x(DxXxF) Theorem 2 
+/,(Co .xD)x((XxE)o.x(XxF)) Theorem 1 


+/,(Co0o.xD)xXxEo.+F 
The final step is based on the fact that (XxA)x(XxB) <> XxA+B. 


A proof in which every step is fully justified is called a formal proof; a step which is justified less 
formally by the observation of some general pattern is called an informal proof. We will now illustrate 
an informal proof by assigning values to the arguments C and D and displaying the tables Co .xD and 
Ho .+F occurring in the last line of theorem 3: 


CH3 A) 4 ESOC 
D20 3 4 FerpD 
Cas xD Eo. +F 

6 0 9 3 Gai 2-23 

2 305. Max al 12 Sy 

Ss 207 De E 2. E 5 


Since the elements of Eo .+F are exponents of X, and since the Ith diagonal of Zo .+F (beginning with 
the zeroth) has the values I, each element of the Ith diagonal of Co.xD is multiplied by Xx*1. We 
may therefore conclude (informally) that the expression is equivalent to a polynomial whose coefficient 
vector is formed by summing the diagonals of Co.xD. Using theorem 3 as well, we therefore conclude 
that this polynomial is equivalent to the product of the polynomials C P w and D P u. 


Programming Style in APL 93 


Many useful identities concern what are called (in APL Language [7] ) structural and selection 
functions, such as reshape, transpose, indexing, and compression. For example, a succession of dyadic 
transpositions can be reduced to a single equivalent transposition by the following identity: 


IQJQA <> I[J]XA 


The proof is given in Iverson [5]. Further examples of proofs in APL may be found in Orth [8] 
and in Iverson [1,4]. 


6. Recursive Definition 


A function can sometimes be defined very neatly by using it in its own definition. For example, the 
faetorial function F:x/1+1w could be defined alternatively by saying that F w <> wxF w-1 and giving 
the auxiliary information that in the case w=0 the value of the function is 1. Such a definition which 
utilizes the function being defined is called a recursive definition. 


The direct definition form as defined in Iverson [4] permits a “conditional” definition such as: 
G:w:04<0:-uw 


Such a definition includes three expressions separated by colons and is interpreted by executing the 
middle one, then executing the first or the last, according to whether the value of the (first element 
of the) middle one is zero or not. Thus G w is (for scalar arguments) equivalent to |w. 


This conditional form is convenient for making recursive definitions. For example, the factorial func- 
tion discussed above could be defined as 7: wxFw-1:w=0:1, and a function to generate the binomial 
coefficients of a given order could be defined recursively as: 


BC:(2Z,0)+0,2*+BCw-1:u=0:1 
For example 


BC 2 BC 3 BC 4 
1.21 Ir Oe Br 1 4641 


Recursive definition can be an extremely useful tool, but one that may require considerable effort to assimilate. 
The study of existing recursive definitions (as in Chapters 7 and 8 of Orth [8] and Chapter 10 of Iverson 
[4] ) may prove helpful. Perhaps the best way to grasp a particular definition is to execute it in detail for a 
few simple cases, either manually or on the computer. The details of computer execution can usually be suitably 
exhibited by inserting [} at one or more points in the definition. We might, for example, modify and 
execute the binomial coefficient function BC as follows: 


BC:(2Z,0)+0,24 kBCw-1:w=0:1 


QEBC 3 


We will now give two less trivial recursive definitions for study. The first generates all permutations 
of a specified order as follows: 
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PER: (-L(1tw)ttX)OX, (Ca) XX) pPERX+w-1:w=1:1 100 


PER, 3 Q'ABCD'[ PER 4] 
DDDDDDABBACCBACCABCCABBA 
CCABBADDDDDDABBACCBACCAB 
BACCABCCABBADDDDDDABBACC 
ABBACCBACCABCCABBADDDDDD 


OFF © ND NO 
RONNO q 
NNO Bai O 


The second is a solution of the “topological sort” problem discussed on pages 258-268 of Knuth [9]. 
Briefly stated, an N by N boolean matrix can specify “precedences” required in the ordering of N items 
(which may represent the steps to be carried out in some production process). If the positions of the 
1’s in row I indicate which items must precede item J, then the function: 


PR:alA(-pa)tS] PR SfS/w:A/S+V/w:(-1tpw)ra 


provides a solution in the sense that it permutes its vector left argument to satisfy the constraints 
imposed by the matrix right argument. For example: 


CT ATDEX! 
M EA PROC PHOC AUS) Pa Mz] 
OCIO 2-4 TPXAS ADDRESS TEXT 
0: 900.0 TEXT FIGURES 
0.20.32 STAMP XEROX 
00000 FIGURES ADDRESS 
0: db ¿QuE 10 XEROX STAMP 


If the required orderings among certain items are inconsistent and cannot be satisfied, they are 
suppressed from the result. 


7. Properties of Defined Functions 


Defined functions used as building blocks in the development of a complex system play much the same 
role as primitives, and the comments made on the assimilation of primitives apply equally to such 
defined functions. Moreover, a clear understanding of the properties of functions under design may 
contribute to their design. 


Many of the general properties of primitives (such as their systematic extension to arrays and the 
existence of primitive inverse functions) are also useful in defined functions and should be preserved 
as much as possible. The section on generality addressed certain aspects of this, and we now briefly 
address some others, including choice of names, application of operators, and the provision of inverse 
functions. 


The names of primitive functions are graphic symbols, and the ease of distinguishing them from the 
names of arguments contributes to the readability of expressions. It is also possible to adopt naming 
schemes which distinguish defined functions from arguments, or which even distinguish several sub- 
classes of defined functions. The choice of mnemonic names for functions can also contribute to clarity; 
the use of the direct form of definition properly focusses attention on the choice of function names 
rather than on the choice of argument names. 


Present APL implementations limit the application of operators (such as reduction and inner product) 
to primitive functions, and do not allow the use of defined functions in expressions such as F'/ and 
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o.F. For any defined function F it is sometimes useful (although questions of efficiency may limit 
the usefulness to experimentation rather than general use) to define a corresponding outer product 
function OPF, and a corresponding reduction function RF. For example: 


F:at:w 
OPF:(ao.+0xw) F (ax0)o.+w 
RF: tto) F RP 140:1=90:u[ 0] 


ASS 7 Ad 
B42 5 10 
AFB A OPF B RE p 
FeO. Le2 I4el Se sae re 2.196078431 
Dior ge 742 
ties Tisz iyi 


The importance of inverse functions in mathematics is indicated in part by the number of inverse pairs 
of functions provided, such as the pair KOw and (-K)Ow, the pair B8w and Bxw, and the pair 

w*N and wx+N. Their importance in non-numeric applications is not so commonly recognized, and 
it is well to keep the matter in mind in designing functions. For example, in designing functions 
GET and PUT for accessing files, it is advantageous to design them as inverses in the sense that the 
expression K PUT K GET 'FILENAME' will produce no change in the file. 


Other examples of useful inverse pairs include the permutations w[P] and w[AP] defined by a given 
permutation vector P, the classification function C:w[4a] and its inverse (discussed in Section 4) 
CI:w[44a], and the “cumulative sum” or “integration” function CS and its inverse, the “difference 
function” DF defined as follows: 


CS:+\w 
DF:w-0, 1+u 
A+3 5 7 11 13 17 
CS A DF A 
3 8 15 26 39 56 3224 2% 
DF CS A CS DE A 
3.8: 2: 21.13.17 3:5: EE AS 17 


8. Efficiency 


Emphasis on clarity of expression in designing a system may contribute greatly to its efficiency by 
leading to the choice of a superior overall approach, but it may also lead to solutions which violate 
the space constraints of a particular implementation or make ineffective use of the facilities which it 
provides. It is therefore necessary at some point to consider the characteristics of the particular 
implementation to be used. The speed and space characteristics of the various implementations of APL 
are too varied to be considered here. There are, however, a number of identities which are of rather 
general use. 


Expressions involving inner and outer products often lead to space requirements which can be allevi- 
ated by partitioning the arguments. For example, if A and B are vectors and R<Ao.f B, then the M 
by N segment of the result represented by (M,N)+R can be computed as (MtA)o.f (WtB), and M and 
N can be chosen to make the best use of available space. The resulting segments may be stored in 
files or, if the subsequent expressions to be applied to the result permit it, they may be applied to 
the segments. For example, if the complete expression is +/4o.fB, then each of the segments may 
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be summed as they are produced. Expressions of the form (M,N)+R can also be generalized to apply 
to higher rank arrays and to select any desired rectangular segment. 


If X is a vector, the reduction +/X can be partitioned by use of the identity: 
+/X 4> (+/KtX)+(+/K4X) 


and this identity applies more generally for reduction by any associative function F. Moreover, this 
identity provides the basis for the partitioning of inner products, a generalization of the partitioning 
used in matrix algebra which is discussed more fully in Iverson [6]. 


The direct use of the distribution function DIS of Section 3 for summarization (in the form 
(DIS A)+.xC) may lead to excessive use of both time and space. Such problems can often be alleviated 
in a general way by the use of sorting. For example, the expression R+A[P+A4] produces an ordered 
list of the account numbers in which all repetitions of any one account number are adjacent. The points 
of change in account numbers are therefore given by the boolean vector B+R# 10R and if the costs 
C are ordered similarly by S+C[P], then the summarization may be performed by summing over the 
intervals of S marked off by B. 


The sorting process discussed above may itself be partitioned, and the subsequent summarization steps 
may, for reasons of efficiency, be incorporated directly in the sorting process. Many of the uses of 
sorting in data processing are in fact obvious or disguised realizations of some classification problem, 
and a simpler statement of the essential process may lead simply to different efficient realizations 
appropriate to different implementations of APL. 


Like the inner and outer product, recursive definitions often make excessive demands on space. In 
some cases, as in the function PER discussed in Section 6, the size of the arguments to which the 
function is successively applied decreases so rapidly that the recursive definition does not greatly 
increase the space requirements. In others, as in the function PR of Section 6, the space requirements 
may be excessive, and the recursive definition can be translated (usually in a straightforward manner ) 
into a more space-efficient iterative program. For example, the following non-recursive definition is 
such a translation of the function PR: 


X+A PRN W 
L1:>(1/S+v/W)/L2 
A+ATA(-p4)145] 
W+S4S/W 

>E1 
L2:Z<(-1topW)+A 


9. Reading 


Perhaps the most important habit in the development of good style in a language remains to be 
mentioned, the habit of critical reading. Such reading should not be limited to collections of well-turned 
and useful phrases, such as Bartlett’s Quotations or the collections of References 2 and 3, nor should 
it be limited to topics in a reader’s particular speciality. 


Manuals and other books about a language are, like grammars and dictionaries in natural language, 
essential, but reading should not be confined to them. Emphasis should be placed rather on the reading 
of books which use the language in the treatment of other topics, as in the references already cited, 
in Berry et al [10,11], in Blaauw [12], and in Spence [13]. 
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The APL neophyte should not be dissuaded from reading by the occurrence of long expressions whose 
meanings are not immediately clear; because the sequence of execution is clear and unambiguous, the 
reader can always work through sample executions accurately, either with pencil and paper, with a 
computer, or both. An example of this is discussed at length in Section 1.1 of Iverson [4]. 


Moreover, the neophyte need not be dissuaded from reading by the occurrence of some unfamiliar 
primitives, since all primitives can be summarized (together with examples) in two brief tables (pages 
32 and 44 of APL Language [7] ), and since these tables are usable after the reading of two short 
sections: Fundamentals (pages 21-28) and Operators (pages 39-43). 


Finally, one may benefit from the critical reading of mediocre writing as well as good; good writing 
may present new turns of phrase, but mediocre writing may spur the reader to improve upon it. 


10. Conclusions 


This paper has addressed the question of style, the manner in which something is said as distinct 
from the substance. The techniques suggested for fostering good style are analogous to techniques 
appropriate to natural language: intimate knowledge of vocabulary (primitives) and commonly used 
phrases (certain defined functions), facility in abstract expression (generality), mastery of a variety 
of equivalent ways of expressing a matter (identities), a knowledge of techniques for examining and 
establishing such equivalences (proofs), a precise general method for using an expression in its own 
definition (recursion), and an emphasis on wide critical reading in rather than about the language. 


If one accepts the importance of good style in APL, then one should consider the implications of these 
techniques for the teaching of APL. Current courses and textbooks typically follow the inappropriate 
model set by the teaching of earlier programming languages, which are not so simply structured and 
not so easy to introduce (as one introduces mathematical notation) in the context of some reasonably 
elaborate use of the language. Moreover, they place little or no emphasis on reading in APL and 
little on the structure of the language, often confusing, for example, the crucial distinction between 
operators and functions by using the same term for both. APL Language [7] does present this 
structure, but, being designed for reference, is not itself a sufficient basis for a course. 
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Appendix A 
Translation from Direct to Del Form 


The problem of translation from the direct to the del form of function definition is fully discussed 
in Section 10.4 of Iverson [4], the discussion culminating in a set of translation functions usable (or 
easily adapted for use) on most implementations of APL. Because it is aimed primarily at an 
exposition of the translation problem, the functions developed in this presentation leave many secondary 
problems (such as the avoidance of name conflicts) to the user, and the following translation functions 
and associated variables may be found more convenient for experimentation with the use of direct 
definition: 


DFO EEL idi KQO 

>((2|+/E='''t)va/ 1 3 ž+/':!' I9 E)/pDe(2p0I0<0)p'! 

Feta X9 ' R9 'w YS ' RO Ex, 1 1 ICR OFX 'Q',' ',D0.5],E 
F+1+pD+(0,-6-+/I)y(-(3xI)++\I=':'! I9 F)OQ(7,0F)o(7xpF)tF 

De30(C9[((2L21W/taw! I9 E),1+1),5;]) ,8D[L;0,(I+2+1F-2),1] 

J+(( 16I)AJ+>f 0 1 014]! I9 E)/K++\I<O, 14I+Fe AQ 

Kev /((-K)oTo.>1i14+f /K) [37-1] 

D+-D,(F,oE)+8 0 2 ¥(K+2xK<10K)o!'! *,E,[0.5)] ';! 


Z+X R9 Y;N 
Z+(,((1tX) I9 Y)o.#Nt1)/,Y,((pY), 1+NM*pX)p14X 


¿+A 19 B 
Ze(Ao.=B)A((pA) „oB)pr2|+\B=' 1" 


29+DEF 
ZI+HJFX F9 N 


C9 
29+ 
Y929+ 
Y9Z9<X9 
)/3>(0=1t, 
>0,0p49+ 
29+ 


Ag 


The foregoing functions were designed more for brevity than clarity; nevertheless the reader who 
wishes to study the translation process in detail may find it useful to compare them with those of 
Reference 4. 


For serious use of direct definition, one should augment the foregoing with functions which record 
the definitions presented, display them on demand, and provide for convenient editing. For example, 
execution of: 


DEF 


DEF R Opal E OO EX FO AAN 
DEFR 
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produces a function DEFR which, like DEF, fixes the definition of any function F presented to it in 
direct form, but which also records the original definition (for later display or editing) in the associated 
variable RF. The display of a desired function could then be produced by the following definition: 


DEFR 
DISPLAY:%,(NA.=( 1tpN)t'R'! NH) 4VNL 2 


For example: 


DEFR 
PLUS :at+w 
DISPLAY 
PLUS 
PLUS; Otw 
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12, 


13. 
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The importance of nomenclature, notation, and 
language as tools of thought has long been recog- 
nized. In chemistry and in botany, for example, 
the establishment of systems of nomenclature by 
Lavoisier and Linnaeus did much to stimulate and 
to channel later investigation. Concerning lan- 
guage, George Boole in his Laws of Thought 
[1, p.24] asserted "That language is an instru- 
ment of human reason, and not merely a medium 
for the expression of thought, is a truth generally 
admitted." 

Mathematical notation provides perhaps the 
best-known and best-developed example of lan- 
guage used consciously as a tool of thought. Recog- 
nition of the important role of notation in mathe- 
matics is clear from the quotations from mathema- 
ticians given in Cajori's A History of Mathemat- 
ical Notations [2, pp.332,331]. They are well 
worth reading in full, but the following excerpts 
suggest the tone: 


By relieving the brain of all unnecessary work, 
a good notation sets it free to concentrate on 
more advanced problems, and in effect increases 
the mental power of the race. 

A.N. Whitehead 


The quantity of meaning compressed into small 
space by algebraic signs, is another circum- 
stance that facilitates the reasonings we are 
accustomed to carry on by their aid. 

Charles Babbage 


Nevertheless, mathematical notation has seri- 
ous deficiencies. In particular, it lacks universali- 
ty, and must be interpreted differently according 
to the topic, according to the author, and even 
according to the immediate context. Programming 
languages, because they were designed for the pur- 


pose of directing computers, offer important ad- 
vantages as tools of thought. Not only are they 
universal (general-purpose), but they are also exec- 
utable and unambiguous. Executability makes it 
possible to use computers to perform extensive 
experiments on ideas expressed in a programming 
language and the lack of ambiguity makes possible 
precise thought experiments. In other respects, 
however, most programming languages are decided- 


ly inferior to mathematical notation and are little 


used as tools of thought in ways that would be 
considered significant by, say, an applied mathe- 
matician. 


The thesis of the present paper is that the ad- 
vantages of executability and universality found in 
programming languages can be effectively com- 
bined, in a single coherent language, with the ad- 
vantages offered by mathematical notation. It is 
developed in four stages: 


(a)Section 1 identifies salient characteristics of 
mathematical notation and uses simple prob- 
lems to illustrate how these characteristics may 
be provided in an executable notation. 


(b)Sections 2 and 3 continue this illustration by 
deeper treatment of a set of topics chosen for 
their general interest and utility. Section 2 
concerns polynomials, and Section 3 concerns 
transformations between representations of 
functions relevant to a number of topics, includ- 
ing permutations and directed graphs. Al- 
though these topics might be characterized as 
mathematical, they are directly relevant to 
computer programming, and their relevance 
will increase as programming continues to de- 
velop into a legitimate mathematical discipline. 


(c)Section 4 provides examples of identities and 
formal proofs. Many of these formal proofs 
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concern identities established informally and 
used in preceeding sections. 


(d)The concluding section provides some general 
comparisons with mathematical notation, refer- 
ences to treatments of other topics, and discus- 
sion of the problem of introducing notation in 
context. 

The executable language to be used is APL, a 
general purpose language which originated in an 
attempt to provide clear and precise expression in 
writing and teaching, and which was implemented 
as a programming language only after several years 
of use and development [3]. 

Although many readers will be unfamiliar with 
APL, I have chosen not to provide a separate intro- 
duction to it, but rather to introduce it in context 
as needed. Mathematical notation is always intro- 
duced in this way rather than being taught, as pro- 
gramming languages commonly are, in a separate 
course. Notation suited as a tool of thought in any 
topic should permit easy introduction in the con- 
text of that topic; one advantage of introducing 
APL in context here is that the reader may assess 
the relative difficulty of such introduction. 

However, introduction in context is incompati- 
ble with complete discussion of all nuances of each 
bit of notation, and the reader must be prepared to 
either extend the definitions in obvious and sys- 
tematic ways as required in later uses, or to con- 
sult a reference work. All of the notation used 
here is summarized in Appendix A, and is covered 
fully in pages 24-60 of APL Language [4]. 

Readers having access to some machine embodi- 
ment of APL may wish to translate the function 
definitions given here in direct definition form 
L5, p.10] (using a and. to represent the left and 
right arguments) to the canonical form required 
for execution. A function for performing this 
translation automatically is given in Appendix B. 


1. Important Characteristics of Notation 


In addition to the executability and universali- 
ty emphasized in the introduction, a good notation 
should embody characteristics familiar to any user 
of mathematical notation: 


o Ease of expressing constructs arising in problems. 
osuggestivity. 

o Ability tosubordinate detail. 

«Economy. 

«Amenability to formal proofs. 
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The foregoing is not intended as an exhaustive list, 
but will be used to shape the subsequent discus- 
sion. 

Unambiguous executability of the notation in- 
troduced remains important, and will be emphasiz- 
ed by displaying below an expression the explicit 
result produced by it. To maintain the distinction 
between expressions and results, the expressions 
will be indented as they automatically are on APL 
computers. For example, the integer function de- 
noted by . produces a vector of the first v integers 


when applied to the argument », and the sum 
reduction denoted by +, produces the sum of the 
elements of its vector argument, and will be shown 
as follows: 


15 
IS > S 
+/15 
15 
We will use one non-executable bit of notation: 
the symbol -+ appearing between two expressions 


asserts their equivalance. 


1.1 Ease of Expressing Constructs Arising in 
Problems 

If it is to be effective as a tool of thought, a 
notation must allow convenient expression not only 
of notions arising directly from a problem, but also 
of those arising in subsequent analysis, generaliza- 
tion, and specialization. 

Consider, for example, the crystal structure 
illustrated by Figure 1, in which successive layers 
of atoms lie not directly on top of one another, but 
lie "close-packed" between those below them. The 
numbers of atoms in successive rows from the top 
in Figure 1 are therefore given by .5, and the total 
number is given by +/:>. 

The three-dimensional structure of such a crys- 
tal 1s also close-packed; the atoms in the plane 
lying above Figure 1 would lie between the atoms 
in the plane below it, and would have a base row of 
four atoms. The complete three-dimensional 
structure corresponding to Figure 1 is therefore a 
tetrahedron whose planes have bases of lengths 1, >, 
3, 4, and s. The numbers in successive planes are 
therefore the partial sums of the vector ıs, that 
is, the sum of the first element, the sum of the 
first two elements, etc. Such partial sums of a 
vector y are denoted by +1v, the function +1 being 
called sum scan. Thus: 

4+\ 15 
1 3 6 10 15 


+/+\15 
35 


The final expression gives the total number of at- 
oms in the tetrahedron. 

The sum +/:5 can be represented graphically in 
other ways, such as shown on the left of Figure 2. 
Combined with the inverted pattern on the right, 
this representation suggests that the sum may be 
simply related to the number of units in a rectan- 
gle, that is, to a product. 

The lengths of the rows of the figure formed by 
pushing together the two parts of Figure 2 are giv- 
en by adding the vector ı5 to the same vector rev- 
ersed. Thus: 


15 


1 2 3 4 5 
$15 
5 4 3 2 1 
(15)+(015) 
6 6 6 6 6 
Fig. 1 Fig. 2 
O O DODOD 
oo 00 0000 
ooo 000 O00 
0000 0000 DO 
0.0: 0: 0-00 00000 O 


This pattern of s repetitions of 6 may be expressed 
as sps, and we have: 


30 


The fact that +/5 06 «+ 6x5 follows from the defini- 
tion of multiplication as repeated addition. 

The foregoing suggests that +/:5 «+ (6x5):2, and, 
more generally, that: 


+/1N ++ ((N+1)xN)#+2 Al 


1.2 Suggestivity 

A notation will be said to be suggestive if the 
forms of the expressions arising in one set of prob- 
lems suggest related expressions which find appli- 
cation in other problems. We will now consider 
related uses of the functions introduced thus far, 
namely: 


t > p +/ +\ 


The example: 


5p2 
2 2 2 2 2 
x/5p2 
32 
suggests that =/mow ++ nxm, Where + represents the 
power function. The similiarity between the defı- 


nitions of power in terms of times, and of times in 


terms of plus may therefore be exhibited as fol- 
lows: 

x/MoN +> N*M 

+/MoN «+ NxM 
Similar expressions for partial sums and partial 
products may be developed as follows: 

x\5p2 
2 4 8 16 32 


2*15 
2 4 8 16 32 


x\MpN +> Nx iM 
+\MpN +> NxıM 
Because they can be represented by a trıangle as 
in Figure 1, the sums +\:5 are called triangular 
numbers. They are a special case of the figurate 
numbers obtained by repeated applications of sum 
scan, beginning either with +\ww, or with +\wo1. 
Thus: 


5p1 +\+\5p1 
11111 1 3 6 10 15 


+\5p1 +\+\+\5pe1 
123 4 5 1 4 10 20 35 


Replacing sums over the successive integers by 
products yields the factorials as follows: 


123 4 5 
x/ı5 x\15 
120 1 2 6 24 120 
15 LES 
120 1 2 6 24 120 


Part of the suggestive power of a language re- 
sides in the ability to represent identities in brief, 
general, and easily remembered forms. We will 
illustrate this by expressing dualıties between 
functions in a form which embraces DeMorgan's 
laws, multiplication by the use of logarithms, and 
other less familiar identities. 

If v is a vector of positive numbers, then the 
product x/v may be obtained by taking the natural 
logarithms of each element of v (denoted by ev), 
summing them (+/ev), and applying the exponential 
function («+/ev). Thus: 


x/V ++ *+/@V 


Since the exponential function + is the inverse of 
the natural logarithm e, the general form suggested 
by the right side of the identity 1s: 


IG F/G V 
where zc is the function inverse to s. 


Using » and y to denote the functions and and 
or, and ~ to denote the self-inverse function of 
logical negation, we may express DeMorgan 's laws 
for an arbitrary number of elements by: 
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A/B +> ~v/~B 
V/B ++ ~A/~B 
The elements of s are, of course, restricted to the 
boolean values o and :. Using the relation symbols 
to denote functions (for example, x<r yields : if x 
is less than y and o otherwise) we can express fur- 
ther dualities, such as: 


z/B +> -=/-B 
=/B «+ ~z/~B 
Finally, using r and ı to denote the maximum 
and minimum functions, we can express dualities 
which involve arithmetic negation: 


[/V «+ -L/-V 

fees. Spey 
It may also be noted that scan (F\) may replace 
reduction (r/) in any of the foregoing dualities. 


1.3 Subordination of Detail 

As Babbage remarked in the passage cited by 
Cajori, brevity facilitates reasoning. Brevity is 
achieved by subordinating detail, and we will here 
consider three important ways of doing this: the 
use of arrays, the assignment of names to functions 
and variables, and the use of operators. 

We have already seen examples of the brevity 
provided by one-dimensional arrays (vectors) in 
the treatment of duality, and further subordina- 
tion is provided by matrices and other arrays of 
higher rank, since functions defined on vectors are 
extended systematically to arrays of higher rank. 

In particular, one may specify the axis to which 
a function applies. For example, ¢£:3” acts along 
the first axis of a matrix m to reverse each of the 
columns, and +1 23m reverses each row; ,(11" caten- 
ates columns (placing m above ~n), and m,t2]" caten- 
ates rows; and +/t11m sums columns and +/C21= 
sums rows. If no axis is specified, the function 
applies along the last axis. Thus +/m sums rows. 
Finally, reduction and scan along the first axis 
may be denoted by the symbols ; and «x. 

Two uses of names may be distinguished: 
constant names which have fixed referents are 
used for entities of very general utility, and ad hoc 
names are assigned (by means of the symbol +) to 
quantities of interest in a narrower context. For 
example, the constant (name) 14 has a fixed refer- 
ent, but the names crare, caver, and row assigned by 
the expressions 

CRATE + 144 


LAYER + CRATE+8 
ROW + LAYER+3 
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are ad hoc, or variable names. Constant names for 
vectors are also provided, as in > 3 5 7 11 for a nu- 


meric vector of five elements, and in ‘azscoe: for a 


character vector of five elements. 


Analogous distinctions are made in the names 


of functions. Constant names such as +, x, and + 
are assigned to so-called primitive functions of 
general utility. The detailed definitions, such as 


+/Mon for nxm and =/mow for v=m, are subordinated by 
the constant names « and «+. 

Less familiar examples of constant function 
names are provided by the comma which catenates 
its arguments as illustrated by: 


E PE Sy +> 1 2 3 4 55 432 1 


and by the base-representation function +, which 
produces a representation of its right argument in 
the radix specified by its left argument. For exam- 
ple: 

2: 02> 2 ey 8 He Oe ST 


22 2 7 ++ 1 0 0 


The matrix sw is an important one, since it can be 
viewed in several ways. In addition to representing 
the binary numbers, the columns represent all sub- 
sets of a set of three elements, as well as the en- 


tries in a truth table for three boolean arguments. 
The general expression for y elements is easily seen 
to be (vo2)1(12*")-1, and we may wish to assign an 
ad hoc name to this function. Using the direct 
definition form (Appendix B), the name 7 is as- 
signed to this function as follows: 


To IT 12 ea) <4 A.2 


The symbol . represents the argument of the func- 
tion; in the case of two arguments the left is repre- 
sented by «. Following such a definition of the 
function z, the expression 7 3 yields the boolean 
matrix sv shown above. 

Three expressions, separated by colons, are also 
used to define a function as follows: the middle 
expression is executed first; if its value is zero the 
first expression is executed, if not, the last expres- 
sion is executed. This form is convenient for re- 
cursive definitions, in which the function is used 
in its own definition. For example, a function 
which produces binomial coefficients of an order 


specified by its argument may be defined recur- 
sively as follows: 


BC:(X,0)+(0,X+BC w-1):w=0:1 A.3 


Thus sc o «+ 1 and ac 1 +> 1 1 and ac 4 ++ 145641, 

The term operator, used in the strict sense 
defined in mathematics rather than loosely as a 
synonym for function, refers to an entity which 
applies to functions to produce functions; an exam- 
ple is the derivative operator. 

We have already met two operators, reduction, 
and scan, denoted by / and \, and seen how they 
contribute to brevity by applying to different func- 
tions to produce families of related functions such 
as +/ and x, and »/. We will now illustrate the 
notion further by introducing the inner product 
operator denoted by a period. A function (such as 
+/) produced by an operator will be called a 
derived function. 

If p and a are two vectors, then the inner prod- 
uct +.x is defined by: 


P+.xQ er +/PxQ 


and analogous definitions hold for function pairs 
other than + and «. For example: 
P+2 3 5 


Q+2 1 2 
P+.xQ 


Px.*Q 


Pl .+Q 


Each of the foregoing expressions has at least 
one useful interpretation: »+.xq is the total cost of 
order quantities @ for items whose prices are given 
by r; because r is a vector of primes, rx.» is the 
number whose prime decomposition is given by the 
exponents a; and if » gives distances from a source 


to transhipment points and q gives distances from 
the transhipment points to the destination, then 
PL.+q gives the minimum distance possible. 

The function +. is equivalent to the inner product 
or dot product of mathematics, and is extended to 
matrices as in mathematics. Other cases such as 
x.» are extended analogously. For example, if 7 is 
the function defined by A.2, then: 


T 3 Px.*T 3 
T- & YO 1.58 5 2 10.6 30 

These examples bring out an important point: if 
g is boolean, then p+.«s produces sums over subsets 
of r specified by ı 's in s, and ex.+*s produces prod- 
ucts over subsets. 


The phrase -.« is a special use of the inner 
product operator to produce a derived function 
which yields products of each element of its left 
argument with each element of its right. For ex- 
ample: 


2 4 6 8 10 
3 6 3.2 T5 
>; LU) 19, 20, 28 


The function ».x is called outer product, as it 
is in tensor analysis, and functions such as -.+ and 
o>.» and -.< are defined analogously, producing 
"function tables” for the particular functions. For 
example: 


Deo 1.23 

Do .[D DA 20 Des O 
E23 T «1-0 0 w ok I 
1. do 2 723 1 2.090 0o i 2 3 
27-2: 2 3 1 1 1 0 0 0 1 3 
3. 117171 O O O 1 


The symbol : denotes the binomial coefficient 
function, and the table »».:» is seen to contain 
Pascal's triangle with its apex at the left; if ex- 
tended to negative arguments (as with o+"3 727101 
23) it will be seen to contain the triangular and higher- 
order figurate numbers as well. This extension to 
negative arguments is interesting for other func- 
tions as well. For example, the table p-.«o consists 
of four quadrants separated by a row and a column 
of zeros, the quadrants showing clearly the rule of 
signs for multiplication. 

Patterns in these function tables exhibit other 
properties of the functions, allowing brief state- 
ments of proofs by exhaustion. For example, com- 
mutativity appears as a symmetry about the diago- 
nal. More precisely, if the result of the transpose 
function x (which reverses the order of the axes of 
its argument) applied to a table r+p-.10 agrees with 
r, then the function + is commutative on the do- 
main. For example, r-ar+o>.ro produces a table of 
1's because r is commutative. 


Corresponding tests of associativity require 
rank 3 tables of the form pe. 0.10) and (D>.fD)>.fD. 
For example: 


DeD 1 


Dos Al DR AD (Do.aD)o.aD Do,s(Do.sD) (02 :£D)0s 2D 
0 0 0 0 1 4 o 1 
0 0 o 0 kei O 1 
0 0 0 0 1 1 1. 4 
o 1 o 1 o 1 o 1 


1.4 Economy 
The utility of a language as a tool of thought 
increases with the range of topics it can treat, but 
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decreases with the amount of vocabulary and the 
complexity of grammatical rules which the user 
must keep in mind. Economy of notation is there- 
fore important. 

Economy requires that a large number of ideas 
be expressible in terms of a relatively small vocab- 
ulary. A fundamental scheme for achieving this is 
the introduction of grammatical rules by which 
meaningful phrases and sentences can be construct- 
ed by combining elements of the vocabulary. 

This scheme may be illustrated by the first 
example treated -- the relatively simple and widely 
useful notion of the sum of the first integers was 
not introduced as a primitive, but as a phrase con- 
structed from two more generally useful notions, 
the function . for the production of a vector of 
integers, and the function +, for the summation of 
the elements of a vector. Moreover, the derived 
function +, is itself a phrase, summation being a 
derived function constructed from the more gener- 
al notion of the reduction operator applied to a 
particular function. 

Economy ıs also achieved by generality in the 
functions introduced. For example, the definition 
of the factorial function denoted by : is not re- 
stricted to integers, and the gamma function of x 
may therefore be written as :x-1. Similiarly, the 
relations defined on all real arguments provide 
several important logical functions when applied to 
boolean arguments: exclusive-or (+), material im- 
plication (<), and equivalence (-). 

The economy achieved for the matters treated 
thus far can be assessed by recalling the vocabulary 
introduced: 


p 
/ \ 
+-x+*r0![|19 
VA=<S<SZ>>XZ 


The five functions and three operators listed in the 
first two rows are of primary interest, the remain- 
ing familiar functions having been introduced to 
illustrate the versatility of the operators. 

A significant economy of symbols, as opposed to 
economy of functions, is attained by allowing any 
symbol to represent both a monadic function (i.e. 


a function of one argument) and a dyadic func- 
tion, in the same manner that the minus sign is 
commonly used for both subtraction and negation. 
Because the two functions represented may, as in 
the case of the minus sign, be related, the burden 
of remembering symbols is eased. 

For example, x-y and -y represent power and 
exponential, xer and ey represent base x logarithm 
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and natural logarithm, x:r and :r represent divi- 
sion and reciprocal, and x:r and :y represent the 
binomial coefficient function and the factorial 
(that is, x:re=(0r)+00x0x0:Y-x)). The symbol » used 
for the dyadic function of replication also repre- 
sents a monadic function which gives the shape of 
the argument (that is, x+»oxoy), the symbol + used 
for the monadic reversal function also represents 
the dyadic rotate function exemplified by 
2oi5+43 4 5 1 2, and by ~2015++4 5 1 2 3, and finally, 
the comma represents not only catenation, but also 
the monadic ravel, which produces a vector of the 
elements of its argument in "row-major' order. 
For example: 
12 „T 2 

070 1 1 00 1 1 010 1 
0: 2 Os 

Simplicity of the grammatical rules of a nota- 
tion is also important. Because the rules used thus 
far have been those familiar in mathematical nota- 
tion, they have not been made explicit, but two 
simplifications in the order of execution should be 
remarked: 


(1)All functions are treated alike, and there are no 
rules of precedence such as x being executed 
before +. 

(2)The rule that the right argument of a monadic 
function is the value of the entire expression to 
its right, implicit in the order of execution of 
an expression such as sin Loc :w, is extended to 
dyadic functions. 


The second rule has certain useful consequences 
in reduction and scan. Since -/v is equivalent to 
placing the function r between the elements of v, 
the expression -/v gives the alternating sum of the 
elements of v, and :,v gives the alternating prod- 
uct. Moreover, if z is a boolean vector, then <\z 
"isolates" the first 1 in s, since all elements follow- 
ing it become o. For example: 


<\0 0 1 1 O 1 1 +> O O O 


Syntactic rules are further simplified by adopt- 
ing a single form for all dyadic functions, which 
appear between their arguments, and for all mo- 
nadic functions, which appear before their argu- 
ments. This contrasts with the variety of rules in 
mathematics. For example, the symbols for the 
monadic functions of negation, factorial, and mag- 


nitude precede, follow, and surround their argu- 
ments, respectively. Dyadic functions show even 
more variety. 


1.5 Amenability to Formal Proofs 

The importance of formal proofs and deriva- 
tions is clear from their role in mathematics. Sec- 
tion 4 is largely devoted to formal proofs, and we 
will limit the discussion here to the introduction 
of the forms used. 

Proof by exhaustion consists of exhaustively 
examining all of a finite number of special cases. 
Such exhaustion can often be simply expressed by 
applying some outer product to arguments which 
include all elements of the relevant domain. For 
example, if D-o 1, then o».ıo gives all cases of appli- 
cation of the and function. Moreover, 
DeMorgan's law can be proved exhaustively by 
comparing each element of the matrix p-..0 with 
each element of -(-D)>+.v(-D) as follows: 


Do .aD ~(~D)o.y(~D) 
0 0 0 0 
o 1 o 1 
(Do.aD)=(~(~D)o.y(~D)) 
1.4 
1 
^/,(De.^aD)=(~(~D)o.yv(~D)) 


Questions of associativity can be addressed sim- 
ilarly, the following expressions showing the asso- 
ciativity of and and the non-associativity of 
not-and: | 


a/,((De.aD)e.AD)=(De.Ar(De.AD)) 


1 
a/,((De.rD)e.rD)=(De.r(De.rD)) 
0 


A proof by a sequence of identities is presented 
by listing a sequence of expressions, annotating 
each expression with the supporting evidence for 
its equivalence with its predecessor. For example, 
a formal proof of the identity A.1 suggested by the 
first example treated would be presented as fol- 
lows: 


+/1N 
+/91N 


+ is associative and commutative 
((+/1N )+(+/01N))+2 (X+X)+2+>X 
(+/CCıN)+($ıN9)))#2 + is associative and commutative 
(+/((N+1)0N))+2 Lemma 
((N+1)xN)#2 Definition of x 
The fourth annotation above concerns an identity 
which, after observation of the pattern ın the spe- 
cial case (15)+(615), might be considered obvious or 
might be considered worthy of formal proof in a 
separate lemma. 

Inductive proofs proceed in two steps: 1) some 
identity (called the induction hypothesis) is as- 
sumed true for a fixed integer value of some par- 
ameter » and this assumption is used to prove that 
the identity also holds for the value x+ı, and 2) 
the ıdentity ıs shown to hold for some integer val- 
ue x. The conclusion is that the identity holds for 
all integer values of » which equal or exceed «. 


Recursive definitions often provide convenient 
bases for inductive proofs. As an example we will 
use the recursive definition of the binomial coeffi- 
cient function ac given by A.3 in an inductive proof 
showing that the sum of the binomial coefficients 
of order y is 2*v. As the induction hypothesis we 
assume the identity: 


ERBE N +> 2*N 


and proceed as follows: 


+/BC N+1 


+/(X,0)+(0,X+BC N) A.3 
(+/X,0)+(+/0,X) + is associative and commutative 
(+/X)+(+/X) 0+Y+>Y 
2x+/X Y+Y+>2xY 
2x+/BC N Definition of X 
2x2*N Induction hypothesis 
2*N+1 Property of Power (*) 


It remains to show that the induction hypothesis 
is true for some integer value of v. From the re- 
cursive definition A.3, the value of sc o 1s the value 
of the rightmost expression, namely 1. Consequent- 
ly, +/8c o is 1, and therefore equals 2+0. 

We will conclude with a proof that 
DeMorgan 's law for scalar arguments, represented 
by: 


AaB +> ~(~A)v(~B) A.4 


and proved by exhaustion, can indeed be extended 
to vectors of arbitrary length as indicated earlier 
by the putative identity: 


RLM) ERNEST AS 


As the induction hypothesis we will assume that 
A.5 is true for vectors of length (ov)-1. 

We will first give formal recursive definitions 
of the derived functions and-reduction and 
or-reduction (+, and v/), using two new primitives, 
indexing, and drop. Indexing is denoted by an 
expression of the form x11, where 7 is a single in- 
dex or array of indices of the vector x. For exam- 
ple, if x«2 3 5 7, then xc21 is 3, and xt2 11 18 3 2. 
Drop is denoted by x+x and is defined to drop |x 
(i.e., the magnitude of x) elements from x, from the 
head if x>o and from the tail if x<o. For example, 
24x is 5 7 and ~2+x is 2 3. The take function (to be 
used later) is denoted by + and is defined analo- 
gously. For example, 3+x is 2 3 5 and "s+x iS 3 5 7. 

The following functions provide formal defini- 
tions of and-reduction and or-reduction: 


ANDRED:w[1JAANDRED 1+w:0=pw:l A.6 
ORRED :w[1]y ORRED 1yw:0=pw:0 A.7 
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The inductive proof of A.5 proceeds as follows: 


A/V 

(CVELJ)ACa/i+V) A.6 
~(~V[1])y(~4a/1+4Vy) A.4 
~(~V[1])y(~~y/~1t+V) A.5 
>(=V[1]1)v(v/-1yV) nX+>+X 
~v/(~V[1]),(~14+V) A.7 
~v/~(V[1],1+V) v distributes over , 
~y/~yV Definition of , (catenation) 


2. Polynomials 


If c is a vector of coefficients and x is a scalar, 
then the polynomial in x with coefficients c may be 
written simply as +/cxx*"1+1pC, OF +/(X*71+1pC)xC, 
Or (x*"1+1pc)+.xC. However, to apply to a non- 
scalar array of arguments x, the power function + 
should be replaced by the power table -.+ as shown 
in the following definition of the polynomial func- 
tion: 


P:(wo.* 1+1pa)+.xa B.1 


For example, 1 3 31201234 ++1 8 27 64125. If pa 
is replaced by 1+pa, then the function applies also 
to matrices and higher dimensional arrays of sets 
of coefficients representing (along the leading axis 
of a) collections of coefficients of different polyno- 
mials. 

This definition shows clearly that the polyno- 
mial is a linear function of the coefficient vector. 
Moreover, if « and. are vectors of the same shape, 
then the pre-multiplier w:.«"1+i19a Is the Vander- 
monde matrix of „ and is therefore invertible if the 
elements of „ are distinct. Hence if c and x are 
vectors of the same shape, and if y+c ? x, then the 
inverse (curve-fitting) problem is clearly solved by 
applying the matrix inverse function a to the Van- 
dermonde matrix and using the identity: 


C +> (BXo.x* 1+1pX)+.xY 


2.1 Products of Polynomials 

The "product of two polynomials s and c" is 
commonly taken to mean the coefficient vector b 
such that: 


DE X +> (B P X)x(C P X) 


It is well-known that » can be computed by taking 
products over all pairs of elements from 3 and c 
and summing over subsets of these products associ- 
ated with the same exponent in the result. These 
products occur in the function table s-.xc, and it is 
easy to show informally that the powers of x asso- 
ciated with the elements of s-.xc are given by the 
addition table e+(~1+1p8)°.+(~1+1pc). For example: 
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X+2 

B+3 1 2 3 

C+2 0 3 

E*("1+1pB)o.+( 1+1pC) 

Bo.xC E XxE 
6 0 9 0 1 2 1 2 4 
2 0 3 1.2 23 2 y 8 
4.0 6 2 3 4 4 8 16 
6 0 9 3 4 5 8 16 32 

+/,(Be.xC)xX*E 
518 

(B P X)*x(C PX) 
518 


The foregoing suggests the following identity, 
which will be established formally in Section 4: 


(B P X)x(C P X)+>+/,(Bo.xC)xX*( 1+1pB)o.+( 1+19C) B.2 


Moreover, the pattern of the exponent table œz 
shows that elements of s».xc lying on diagonals are 
associated with the same power, and that the coef- 
ficient vector of the product polynomial is there- 
fore given by sums over these diagonals. The table 
Bo.xc therefore provides an excellent organization 
for the manual computation of products of polyno- 
mials. In the present example these sums give the 
vector D+6 2 13 9 6 9, and p p x may be seen to equal 
(BPX)*(CPX). 

Sums over the required diagonals of zəxc can 
also be obtained by bordering it by zeros, skewing 
the result by rotating successive rows by successive 
integers, and then summing the columns. We thus 
obtain a definition for the polynomial product 
function as follows: 


PP:+#(1-1pa)Oac.xw,1+0xa 


We will now develop an alternative method 
based upon the simple observation that if s pr c 
produces the product of polynomials s and c, then 
pp is linear in both of its arguments. Consequent- 


ly, 


PP:a+.xA+.xuw 


where 4 is an array to be determined. 4 must be of 
rank 3, and must depend on the exponents of the 
left argument (71i+1pa), of the result (Ti+ip1ra,0), 
and of the right argument. The "deficiencies" of 
the right exponent are given by the difference ta- 
ble (1p1+a,0)>.-1pw, and comparison of these values 
with the left exponents yields 4. Thus 


A+( 1+ıpa)e.=((ıplra,w)e.-ıpw) 
and 
PP:a+.x(("1+ıpa)e.=-(ıplra,w)e.-ıpw)+t.xw 


Since a+.xa Is a matrix, this formulation sug- 
gests that if o+z pp c, then c might be obtained 
from > by pre-multiplying it by the inverse matrix 
(85+.x4), thus providing division of polynomials. 


Since 3+.xa is not square (having more rows than 
columns), this will not work, but by replacing 
M+B+.xA by either its leading square part (201 /pm)+m, 
or by its trailing square part (-2pL/em)+m, one ob- 
tains two results, one corresponding to division 
with low-order remainder terms, and the other to 
division with high-order remainder terms. 


2.2 Derivative of a Polynomial 

© Since the derivative of x+w is wxx*v-1, we may 
use the rules for the derivative of a sum of func- 
tions and of a product of a function with a con- 
stant, to show that the derivative of the polynomi- 
al c ep x is the polynomial (1+cx"1+1pc) p x. Using 
this result it is clear that the integral is the polyn- 
omial (4,c+1pc) p x, where a is an arbitrary scalar 
constant. The expression 14cx"1+1pc also yields the 


coefficients of the derivative, but as a vector of the 
same shape as c and having a final zero element. 


2.3 Derivative of a Polynomial with Respect 
to Its Roots 

If z is a vector of three elements, then the de- 
rivatives of the polynomial «/x-s with respect to 
each of its three roots are -(x-R121)x(x-R133), and 
-(X-RLIJIx(X-RL3]), and -(x-#(11)x¢(x-R0t21). More 
generally, the derivative of «/x-r with respect to 
RJJ 18 Simply -(x-#)*.«J2.98, and the vector of de- 


rivatives with respect to each of the roots is 


-(X-R)x.*xjo.z]+ipRŘ. 

The expression «</x-2 for a polynomial with 
roots z applies only to a scalar x, the more general 
expression being «/x-.-2. Consequently, the gener- 


al expression for the matrix of derivatives (of the 
polynomial evaluated at xz} with respect to root 


REJJ) 1S given by: 


-(Xo0,-A)x.*fo.,zlTeipoR B3 


2.4 Expansion of a Polynomial 

Binomial expansion concerns the development 
of an identity in the form of a polynomial in x for 
the expression (x+r)*w. For the special case of y=: 
we have the well-known expression in terms of the 
binomial coefficients of order x: 


(X+1)*N <> ((0,1N)INJ)2 X 


By extension we speak of the expansion of a 
polynomial as a matter of determining coefficients 
p such that: 


en 


C BR DAS 


The coefficients » are, in general, functions of y. If 
yr=1 they again depend only on binomial coeffi- 
cients, but in this case on the several binomial 


coefficients of various orders, specifically on the 
matrix Je.:J+“1+1pC. 

For example, if c+3 1 2 4, and c p x+1++0 p x, then 
p depends on the matrix: 


0 v 23 0 T23 


> nve 
-= WwW w e 


and » must clearly be a weighted sum of the col- 
umns, the weights being the elements of c. Thus: 


D+(Joe.!J+ i+tipC)+. xC 


Jotting down the matrix of coefficients and per- 
forming the indicated matrix product provides a 
quick and reliable way to organize the otherwise 
messy manual calculation of expansions. 

If z is the appropriate matrix of binomial coef- 
ficients, then o+3+.xc, and the expansion function is 
clearly linear in the coefficients c. Moreover, ex- 
pansion for y=~1 must be given by the inverse ma- 
trix 83, which will be seen to contain the alternat- 
ing binomial coefficients. Finally, since: 

C P X+(K+1) +> C P (X+K)+1 «> (B+.xC) P (X+K) 


Laai 


it follows that the expansion for positive integer 
values of y must be given by products of the form: 


B+.xB+4.xB+4.xB+.xC 


where the 3 occurs y times. 

Because +.x is associative, the foregoing can be 
written as m+.xc, where m is the product of y occur- 
rences of 3. It is interesting to examine the succes- 
sive powers of z, computed either manually or by 
machine execution of the following inner product 
power function: 


IPP:a+.xa IPP w-1:w=O0:J°.=J+ 1+114pa 


Comparison of s ıpr x with 3 for a few values of 
x shows an obvious pattern which may be ex- 
pressed as: 


B IPP K +> BxKx0[ -Jo.-J+ 1+11+pB 


The interesting thing is that the right side of this 
identity is meaningful for non-integer values of x, 
and, in fact, provides the desired expression for the 
general expansion c P x+r: 


C P(X+Y) <> (((Jo.!J)xYx0f-Jo.-J+ 1+1pC)+.xC)P X B.4 


The right side of B.4 is of the form (m+.«c)e x, 
where m itself is of the form sxr«r and can be dis- 
played informally (for the case «= c) as follows: 


1 1 


o 0 x ¥ x 0.0.0 


OS e 
D r K H 
> Do re 
oO ¡e 
oO cS 
O E 
OP AN 0 
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Since y+x multiplies the single-diagonal matrix 
Bx(K=E), the expression for m can also be written as 
the inner product (y«)+.xr, where 7 is a rank 3 
array whose xth plane is the matrix s*(x=2). Such 
a rank three array can be formed from an upper 
triangular matrix » by making a rank 3 array 
whose first plane is m (that is, (1=11+9M)».*m) and 
rotating it along the first axis by the matrix Je-u, 
whose xth superdiagonal has the value -x. Thus: 


DS:(Ilo.-I)096[1I(1=T+114pw)0.Xw B.5 


DS Ko.!K+ 1+13 


Substituting these results in B.4 and using the 
associativity of +.x, we have the following identity 
for the expansion of a polynomial, valid for non- 
integer as well as integer values of r: 


C P X+Y +> ((Y*J)+.x(DS Jo.!J+e 1+1pC)+.xC)P X B.6 


For example: 


¥+3 

C+3 1 4 2 

Me( Y*J)+.xDS Jo.tJe 1+1pC 
M 

9 27 

6 27 


3. Representations 


The subjects of mathematical analysis and com- 
putation can be represented in a variety of ways, 
and each representation may possess particular 
advantages. For example, a positive integer y may 
be represented simply by » check-marks; less sim- 
ply, but more compactly, in Roman numerals; even 
less simply, but more conveniently for the per- 
formance of addition and multiplication, in the 
decimal system; and less familiarly, but more con- 
veniently for the computation of the least common 
multiple and the greatest common divisor, in the 
prime decomposition scheme to be discussed here. 

Graphs, which concern connections among a 
collection of elements, are an example of a more 
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complex entity which possesses several useful rep- 
resentations. For example, a simple directed graph 
of y elements (usually called nodes) may be repre- 
sented by an ~ by » boolean matrix z (usually called 
an adjacency matrix) such that atr;.1-1 if there is 
a connection from node 7 to node y. Each connec- 
tion represented by a : in 2 1s called an edge, and 
the graph can also be represented by a +/,8 by w 
matrix in which each row shows the nodes con- 
nected by a particular edge. 

Functions also admit different useful represent- 
ations. For example, a permutation function, 
which yields a reordering of the elements of its 
vector argument x, may be represented by a per- 
mutation vector » such that the permutation func- 
tion is simply xte, by a cycle representation which 
presents the structure of the function more direct- 
ly, by the boolean matrix 3+r»=ırr such that the 
permutation function is 3+.xx, or by a radix repre- 
sentation z which employs one of the columns of 
the matrix 1+(¢:”)171+.!:"«ox, and has the property 
that 21|+/r-1 is the parity of the permutation repre- 
sented. 

In order to use different representations con- 
veniently, it is important to be able to express the 
transformations between representations clearly 
and precisely. Conventional mathematical nota- 
tion is often deficient in this respect, and the pres- 
ent section is devoted to developing expressions for 
the transformations between representations useful 
in a variety of topics: number systems, polynomi- 
als, permutations, graphs, and boolean algebra. 


3.1 Number Systems 

We will begin the discussion of representations 
with a familiar example, the use of different repre- 
sentations of positive integers and the transforma- 
tions between them. Instead of the positional or 
base-value representations commonly treated, we 
will use prime decomposition, a representation 
whose interesting properties make it useful in in- 
troducing the idea of logarithms as well as that of 
number representation [ 6, Ch.16 ]. 

If » is a vector of the first pr primes and z is a 
vector of non-negative integers, then z can be used 
to represent the number «.»z, and all of the integ- 
ers if/e can be so represented. For example, 


2357x.*0000181and2357x.* 1100 18 6 


and: 

P 
2-3 57 

ME 
010201030 1 
Oo Or A oO: OF E 000-200 
000010 0 0 0 1 
00000 01 0 0 0 

Px *ME 


1 2 3 4 5 67 8 9 10 


The similarity to logarithms can be seen in the 
identity: 


x/Px.*ME ++ Px.x+/ME 


which may be used to effect multiplication by ad- 
dition. 

Moreover, if we define ccp and ¿cm to give the 
greatest common divisor and least common multi- 
ple of elements of vector arguments, then: 


GCD PX.*ME ++ Px.xL/ME 
LCM Px.*ME +> Px.x[ /ME 


ME V+Px .*ME 
2 1 0 V 
3 1 2 18900 7350 3087 
2 2 0 GCD V LCM V 
1 2 3 21 926100 
Px.*L/ME Px.*f /ME 
21 926100 


In defining the function cco, we will use the 
operator / with a boolean argument 2 (as in s,). It 
produces the compression function which selects 
elements from its right argument according to the 
ones in s. For example, 1 o 1 0 1/15 is 1 3 5. More- 
over, the function 2, applied to a matrix argument 
compresses rows (thus selecting certain columns), 
and the function s compresses columns to select 
rows. Thus: 

GCD:GCD M,(M+L/R)|IR:12pR+(w=0)/w:+/R 
LCM:(x/X)=GCD X+(1+w),LCM 14w:0=pw:1 

The transformation to the value of a number 
from its prime decomposition representation (vrr) 
and the inverse transformation to the representa- 
tion from the value (rrv) are given by: 


VFR:ax.*w 
RFV:D+a RFV wsax.»D:r/-D+0=alw:D 


For example: 


P VFR 2 1 3 1 
10500 

P RFV 10500 
2 1 3-1 


3.2 Polynomials 

Section 2 introduced two representations of a 
polynomial on a scalar argument x, the first in 
terms of a vector of coefficients c (that is, 
+/Cxx*"1+1pc), and the second in terms of its roots z 
(that is, «/x-r). The coefficient representation is 
convenient for adding polynomials (c+>) and for 
obtaining derivatives (1+cx"1+1pc). The root repre- 
sentation is convenient for other purposes, includ- 
ing multiplication which is given by 21,22. 

We will now develop a function crr 
(Coefficients from Roots) which transforms a roots 
representation to an equivalent coefficient repre- 


sentation, and an inverse function zre. The devel- 
opment will be informal; a formal derivation of crr 
appears in Section 4. 

The expression for crr will be based on 
Newton's symmetric functions, which yield the 
coefficients as sums over certain of the products 
over all subsets of the arithmetic negation (that is, 
-R) of the roots z. For example, the coefficient of 
the constant term is given by x/-r, the product 
over the entire set, and the coefficient of the next 
term is a sum of the products over the elements of 
-R taken (pr)-1 at a time. 

The function defined by A.2 can be used to 
give the products over all subsets as follows: 


P+(-R)x.*M+T pR 


The elements of r summed to produce a given coef- 
ficient depend upon the number of elements of a 
excluded from the particular product, that is, upon 
+¢~m, the sum of the columns of the complement of 
the boolean "subset" matrix zor. 

The summation over » may therefore be ex- 
pressed as ((0,1pR)>.=+24-M)+.xP, and the complete 
expression for the coefficients c becomes: 


C+((0,ıpR)e.=+/-M)+.x(-R)x.*M+T OR 


For example, if r+2 3 5, then 


M +=M 
0000171711 3 2 2 12110 
O 01100 11 (0,1pR)o.=+4M 
010101071 000000071 
(-R)x.xM 00010110 
1 5 3 15 72 10 6 ~30 01101000 
100000 0 0 
((0O,ıpR)e.=+/-M)+.x(-R)x.*M+T OR 


"30 31 10 1 


The function crr which produces the coefficients 
from the roots may therefore be defined and used 
as follows: 


CFR:((0,1pw)o.=+/=-M)+.x(-w)x.*M+T pu Cl 


CFR 2 3 5 
-30 31 10 1 

(CFR 2 3 5) P X+1 2345678 
8 0 O 2 012 40 90 

x/Xo.-2 3 5 
8 00 20 12 40 90 


The inverse transformation rrc is more diffi- 
cult, but can be expressed as a successive approxi- 
mation scheme as follows: 

RFC:( 1+1p14w)G w 


G:(a-Z)G w:TOL2[/|Z+a STEP w:a-Z 
STEP:(Blaoe.-a)x.*»Io.zI+ıpa)t.x(ao.x 1t+ipw)t.xw 


DeCe«CPR 235 7 
210 "247 101 "17% 
TOL+1E 8 
RFC C 
7523 


The order of the roots in the result is, of course, 
immaterial. The final element of any argument of 
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rec must be ı, since any polynomial equivalent to 
x/x-r must necessarily have a coefficient of : for 
the high order term. 

The foregoing definition of zre applies only to 
coefficients of polynomials whose roots are all real. 
The left argument of c in zrzec provides (usually 
satisfactory) initial approximations to the. roots, 
but in the general case some at least must be com- 
plex. The following example, using the roots of 
unity as the initial approximation, was executed on 
an APL system which handles complex numbers: 


(*O072x( 1+1N)+Nepliyru)Cw C.2 
Ce reCER 1J1 ay 1 1d DÍ 
+0 “Te NL. Me A 
REC C 


do TO UE 2 


The monadic function o used above multiplies its 
argument by pl. 

In Newton's method for the root of a scalar 
function r, the next approximation is given by 
A«A-(F A)tor A, Where or is the derivative of r. The 
function sree is the generalization of Newton's 
method to the case where + is a vector function of 
a vector. It is of the form (8m)+.-», where 3 is the 
value of the polynomial with coefficients ., the 
original argument of «rc, evaluated at a, the cur- 
rent approximation to the roots; analysis similar to 
that used to derive B.3 shows that m is the matrix 
of derivatives of a polynomial with roots a, the 
derivatives being evaluated at a. 

Examination of the expression for » shows that 
its off-diagonal elements are all zero, and the ex- 
pression (8m)+.«» may therefore be replaced by 5:0, 
where » is the vector of diagonal elements of m. 
Since (7,/)+w drops : rows and „ columns from a 
matrix x, the vector v may be expressed as 
x/0 14(7t+1pa)eae.-a; the definition of the function 
srep may therefore be replaced by the more effi- 
cient definition: 


PAID. 


STEP: ((ac.* itipw)t.»*w)tx/U 14( 1+1paJ)das.-a C3 


This last is the elegant method of Kerner [7]. 
Using starting values given by the left argument 
of c in C.2, it converges in seven steps (with a tol- 
erance roz+1878) for the sixth-order example given 
by Kerner. 


3.3 Permutations 

A vector » whose elements are some permuta- 
tion of its indices (that is, ./1=+/P:.=ioP) will be 
called a permutation vector. If » is a permutation 
vector such that (ox)=o0, then xt») is a permutation 
of x, and » will be said to be the direct representa- 
tion of this permutation. 
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The permutation xto) may also be expressed as 
B+.xx, where s is the boolean matrix pe.=:92. The 
matrix s will be called the boolean representation 
of the permutation. The transformations between 
direct and boolean representations are: 


BFD:we.=1pu DFB:w+.xıltpw 


Because permutation is associative, the compos- 
ition of permutations satisfies the following rela- 
tions: 

(XED1])LD2] +> XU(D1 [D2))] 
B2+.x(Bit.xX) +> (B2+.xB1)+.xX 

The inverse of a boolean representation 2 is as, and 
the inverse of a direct representation is either 40 or 
piipp. (The grade function 4 grades its argument, 
giving a vector of indices to its elements in ascend- 
ing order, maintaining existing order among equal 
elements. Thus 43714 is 314 2 and 43 734 1s 
134 2. The index-of function . determines the 
smallest index in its left argument of each element 
of its right argument. For example, '18CD£'1'BABE' 
is 2 1 2 5, and 'BABE':'ABCDE' 18 2 1 5 5 4.) 

The cycle representation also employs a permu- 
tation vector. Consider a permutation vector c and 
the segments of c marked off by the vector c=11c. 
For example, if c+7 365214, then c=ixc is 
1 10 0 11 0, and the blocks are: 


6 5 


e NUON 


4 


Each block determines a "cycle" in the associated 
permutation in the sense that if z is the result of 
permuting x, then: 


R[7)] is X[7]) 
R[3)] is X[6] 
R[2) is X[2] 
R[1] is X[4) 


ALG Pos ALS] R[5] is X[3] 


RC4) is X[1] 


If the leading element of c is the smallest (that is, 
1), then c consists of a single cycle, and the permuta- 
tion of a vector x which it represents is given by 
x(cj+xt1oc). For example: 


X*+'ABCDEFG' 
C+1 7 6 5 2 4 3 
XLCI+XL16C] 
nase 
Since xtg1+-4 is equivalent to x+Ac4g), it follows 
that x(cl+x(1¢c] is equivalent to x+xt( 1$c)14c1), and 
the direct representation vector » equivalent to c is 
therefore given (for the special case of a single 
cycle) by p+(16c){ac). 
In the more general case, the rotation of the 
complete vector (that is, 1¢c) must be replaced by 
rotations of the individual subcycles marked off by 


c=i1c, as shown in the following definition of the 
transformation to direct from cycle representation: 


DEC:(wLAX++XX+w=l1wJ])[4w) 


If one wishes to catenate a collection of disjoint 
cycles to form a single vector c such that c-ı\c 
marks off the individual cycles, then each cycle cz 
must first be brought to standard form by the 
rotation (“1+c1:1/c1)4c1, and the resulting vectors 
must be catenated in descending order on their 
leading elements. 

The inverse transformation from direct to cycle 
representation is more complex, but can be ap- 
proached by first producing the matrix of all pow- 
ers of » up to the .oth, that is, the matrix whose 
successive columns are p and ot] and (DtDJ)(DJ, 
etc. This is obtained by applying the function row 
to the one-column matrix o2».+,0o formed from >, 
where pow is defined and used as follows: 


POW:POW D,(D+u[ ;1]))[uoJ:s/pw:w 


D+D+DFC C+7,3 6 5,2,1 4 
42613 5 7 


POW Do.+,0 
4 1 4 1 4 1 4 
2 2 2 2 2 2 2 
65- 3 bo. Bb 
1 4 1 4 1 4 1 
3: 6-5 3- 6 -5 3 
o a 0 3 
PAE AT 


If uerow D>.+,0, then the cycle representation of 
p may be obtained by selecting from m only 
“standard” rows which begin with their smallest 
elements (ssr), by arranging these remaining rows 
in descending order on their leading elements 
(pot), and then catenating the cycles in these rows 
(CIR). Thus: 


CFD:CIR DOL SSR POW wo.+,0 


SSR:(A/M=1$M+L\w)fw 
DOL:w[lYul;1];] 
CIR:(,1,4\0 1r02L\w)/,w 


DFC C47,3 6 5,2,1 4 
4 2 61357 

CFD DFC C 
7 365214 


In the definition of poz, indexing is applied to 
matrices. The indices for successive coordinates are 
separated by semicolons, and a blank entry for any 
axis indicates that all elements along it are select- 
ed. Thus mr;1) selects column ı of m. 

The cycle representation is convenient for de- 
termining the number of cycles in the permutation 
represented (wc:+/w-t\w), the cycle lengths 
(cL:X-0,714X+(16w=L\w)/ip), and the power of the 
permutation (pr:zcm CL w). On the other hand, it is 
awkward for composition and inversion. 


The :v column vectors of the matrix 
(é1w)r"1++: are all distinct, and therefore provide 
a potential radix representation [8] for the :» 
permutations of order ». We will use instead a 
related form obtained by increasing each element 
by 1: 


RR:1+(diw)T 1titw 


RR 4 
1 141 1111222222333333 4 4 4 4 Q4 y4 
1122331122331122 33112233 
VA AA 223,2 T A 3° 9223002172188 23.2 
1 1 111111111111 1111111111 


Transformations between this representation and 
the direct form are given by: 

DER:w[1),X+w[1]sX+DFR 1+w:0=p4:0w 

RFD:w{1],RFD X-w[1]5X+1+w:0=pu:4w 

Some of the characteristics of this alternate 

representation are perhaps best displayed by modi- 
fying prr to apply to all columns of a matrix argu- 
ment, and applying the modified function mr to the 
result of the function er: 


3],01)X¥+00(01 pÄ)pi;]sX+MF 1 O+w: 0=l1tpw:w 


MF:w[,1 

MF RR 4 
1 1 1 1 1 1222222333333 4 4 4 4 H Y 
22334 4 1 133 4 4 £1224 4 1192293 «<3 
3 4 24 2334 1 4132 4 «124 «1-22 313 «1«2 
43 4 2? 324 34 134 4 2 4 1271302371 271 


The direct permutations in the columns of this 
result occur in lexical order (that is, in ascending 
order on the first element in which two vectors 
differ); this is true in general, and the alternate 
representation therefore provides a convenient way 
for producing direct representations in lexical or- 
der. 

The alternate representation also has the useful 
property that the parity of the direct permutation 
p is given by 21+/7:+RFD D, Where min represents the 
residue of x modulo m. The parity of a direct rep- 
resentation can also be determined by the func- 
tion: 


PAR:2|+/,(Io.>I+ıpw)Awe.>w 


3.4 Directed Graphs 

A simple directed graph is defined by a set of x 
nodes and a set of directed connections from one to 
another of pairs of the nodes. The directed con- 
nections may be conveniently represented by a x by 
k boolean connection matrix c in which c(7;J3=1 
denotes a connection from the rth node to the „th. 

For example, if the four nodes of a graph are 
represented by x+'arsr', and if there are connec- 
tions from node s to node a, from x to 7, and from 7 
to q, then the corresponding connection matrix is 
given by: 
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0000 
000 1 
1 0 0 0 
1000 


A connection from a node to itself (called a self- 
loop) is not permitted, and the diagonal of a con- 
nection matrix must therefore be zero. 

If e is any permutation vector of order ov, then 
N1+NTP] is a reordering of the nodes, and the corre- 
sponding connection matrix is given by ctr;rı. We 
may (and will) without loss of generality use the 
numeric labels po~ for the nodes, because if x is any 
arbitrary vector of names for the nodes and z is 
any list of numeric labels, then the expression 
Q+Nt1] gives the corresponding list of names and, 
conversely, v.@ gives the list z of numeric labels. 

The connection matrix c is convenient for ex- 
pressing many useful functions on a graph. For 
example, +/c gives the out-degrees of the nodes, 
++c gives the in-degrees, +/,c gives the number of 
connections or edges, xc gives a related graph with 
the directions of edges reversed, and cvac gives a 
related "symmetric" or "undirected" graph. 
Moreover, if we use the boolean vector s+v/(11 
epc)e.=L to represent the list of nodes z, then av.ac 
gives the boolean vector which represents the set 
of nodes directly reachable from the set s. Conse- 
quently, cv..c gives the connections for paths of 
length two in the graph c, and cvcv.ac gives connec- 
tions for paths of length one or two. This leads to 
the following function for the transitive closure of 
a graph, which gives all connections through paths 
of any length: 


TC:TC Z:r1/ ,„w=2+wVwV .Aw:2 


Node y is said to be reachable from node 7 if 
(TC c)t1;J1=1, A graph is strongly-connected if 
every node is reachable from every node, that is 
A/,TCC. 

If o+rc c and bt7;11=1 for some 7, then node 7 is 
reachable from itself through a path of some 
length; the path is called a circuit, and node 7 is 
said to be contained in a circuit. 

A graph 7 is called a tree if it has no circuits 
and its in-degrees do not exceed :, that is, 2/12+47. 
Any node of a tree with an in-degree of o is called 
a root, and if x++/0=++7, then 7 is called a x-rooted 
tree. Since a tree is circuit-free, x must be at least 
1. Unless otherwise stated, it is normally assumed 
that a tree is singly-rooted (that is, x=ı); 
multiply-rooted trees are sometimes called forests. 

A graph c covers a graph po if »/,czo. Ifc isa 
strongly-connected graph and r is a (singly-rooted) 
tree, then 7 is said to be a spanning tree of c if c 


118 KENNETH E. IVERSON 


covers 7 and if all nodes are reachable from the 
root of 7, that is, 


(1/,G2T) a A/RVRV.ATC T 


where x is the (boolean representation of the) root 
of 7. 

A depth-first spanning tree [9] of a graph c 
is a Spanning tree produced by proceeding from the 
root through immediate descendants in c, always 
choosing as the next node a descendant of the lat- 
est in the list of nodes visited which still possesses 
a descendant not in the list. This is a relatively 
complex process which can be used to illustrate the 
utility of the connection matrix representation: 

DFST:((,1)°.=K) R waKo.v~K+a=11+pw C.4 
R:(C,C1Ja)RwaPo .v~C#<\UAPV. Aw 


ımvV/P+(<\av.AwV.AU+rv/a)v.ra 
: Ww 


Using as an example the graph c from [9]: 


G 1 DFST G 
001100000000 001100000000 
000010000000 000010000000 
010011000000 010001000000 
000000110000 000000110000 
000000001000 000000001000 
000000001000 00000000000 0 
000000000100 000000000100 
0000900000011 0 NG OO) 0 0 0-9: 0.0.1.0 
001000000001 000000000001 
100000000001 00000000000 0 
000000000100 000000000000 
10000000000 0 000000000000 


The function orsr establishes the left argument 
of the recursion r as the one-row matrix represent- 
ing the root specified by the left argument of prsz, 
and the right argument as the original graph with 
the connections into the root x deleted. The first 
line of the recursion r shows that it continues by 
appending on the top of the list of nodes thus far 
assembled in the left argument the next child c, 
and by deleting from the right argument all con- 
nections into the chosen child c except the one 
from its parent ». The child c is chosen from 
among those reachable from the chosen parent 
(Pv.1w), but is limited to those as yet untouched 
(vapv.sw), and is taken, arbitrarily, as the first of 
these (<\vapv.aw). 

The determinations of r and y are shown in the 
second line, p being chosen from among those nodes 
which have children among the untouched nodes 
(uv.sv). These are permuted to the order of the 
nodes in the left argument (av.,wv.v), bringing 
them into an order so that the last visited appears 
first, and > is finally chosen as the first of these. 

The last line of z shows the final result to be 
the resulting right argument ., that is, the original 
graph with all connections into each node broken 


except for its parent in the spanning tree. Since 
the final value of « is a square matrix giving the 
nodes of the tree in reverse order as visited, substi- 
tution of „,#C1Ja (or, equivalently, «,e«) for u 
would yield a result of shape 1 2xoc containing the 
spanning tree followed by its ''preordering" infor- 
mation. 

Another representation of directed graphs often 
used, at least implicitly, is the list of all node pairs 
v,w such that there is a connection from rv to +. 
The transformation to this list form from the con- 
nection matrix may be defined and used as follows: 


LFC:(,w)/1+Dt 1+1x/D+*puw 


C LFC C 
0011 1123 3 4 
0010 3°43 2 4 1 
O 1 0 1 
1000 


However, this representation is deficient since it 
does not alone determine the number of nodes in 
the graph, although in the present example this is 
given by r/,zrc c because the highest numbered 
node happens to have a connection. A related boo- 
lean representation is provided by the expression 
(LFCC)e.=11+C, the first plane showing the out- and the 
second showing the in-connections. 

An incidence matrix representation often used 
in the treatment of electric circuits [10] is given 
by the difference of these planes as follows: 


IFC:-#(LFC w6)o.=114pw 
For example: 


(LFC C)o.=114pC I 


oo corr 
oOo CORO CO 
Orr OOO 
POO OrR Fr 


PROOOOO 
OOrPoOo0oOoOo 
ooo»ro+er 


In dealing with non-directed graphs, one some- 
times uses a representation derived as the or over 
these planes (vs). This is equivalent to ırrc c. 

The incidence matrix 7 has a number of useful 
properties. For example, +/7 is zero, +, gives the 
difference between the in- and out-degrees of each 
node, oz gives the number of edges followed by the 
number of nodes, and </pz gives their product. 
However, all of these are also easily expressed in 
terms of the connection matrix, and more signifi- 
cant properties of the incidence matrix are seen in 
its use in electric circuits. For example, if the 
edges represent components connected between the 


nodes, and if v is the vector of node voltages, then 
the branch voltages are given by 7+.xv; if sz is the 
vector of branch currents, the vector of node cur- 
rents is given by 81+.x1. 

The inverse transformation from incidence ma- 
trix to connection matrix is given by: 


CFI:Dp( 1+1*/D)eDil(i “1°.=w)+.x 1+114D0+L\Opu 


The set membership function « yields a boolean 
array, of the same shape as its left argument, 
which shows which of its elements belong to the 
right argument. 


3.5 Symbolic Logic 

A boolean function of v arguments may be rep- 
resented by a boolean vector of 2+" elements in a 
variety of ways, including what are sometimes 
called the disjunctive, conjunctive, equivalence, 
and exclusive-disjunctive forms. The transforma- 
tion between any pair of these forms may be repre- 
sented concisely as some 2» by 2*¥ matrix formed 


by a related inner product, such as rv.ıxr, where 7 
+ r x is the "truth table" formed by the function z de- 
fined by A.2. These matters are treated fully in 
[11, Ch.7]. 


4. Identities and Proofs 


In this section we will introduce some widely 
used identities and provide formal proofs for some 
of them, including Newton 's symmetric functions 
and the associativity of inner product, which are 
seldom proved formally. 


4.1 Dualities in Inner Products 

The dualities developed for reduction and scan 
extend to inner products in an obvious way. If or 
is the dual of r and oc is the dual of c with respect 
to a monadic function m with inverse mz, and if a 
and z are matrices, then: 


A F.C B +> MI (M A) DF.DC (M B) 


For example: 


Av .AB ++ ~(~A)A.v(~B) 
Arn.=B e» ~(~A)v.4(~B) 
AL.+B +> -(-A)f.+(-B) 


The dualities for inner product, reduction, and 
scan can be used to eliminate many uses of boolean 
negation from expressions, particularly when used 
in conjunction with identities of the following 
form: 
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Aa(~B) ++ A>B 
(~A )AB ++ A<B 
(~A)a(~B) +> AWB 


4.2 Partitioning Identities 


Partitioning of an array leads to a number of 
obvious and useful identities. For example: 


x/3 1 4 2 6 +> (x/3 1) x (x/4 2 6) 


More generally, for any associative function r: 


F/V «+ (F/K+V) F (F/K4V) 
F/V,W ++ (F/V) F (F/W) 

If r is commutative as well as associative, the 
partitioning need not be limited to prefixes and 
suffixes, and the partitioning can be made by com- 
pression by a boolean vector v: 


F/V ++ (F/U/V) F (F/(-U)/V) 


If z is an empty vector (0=p£), the reduction F/z 
yields the identity element of the function r, and 
the identities therefore hold in the limiting cases 
o=x and o0=v/u, 

Partitioning identities extend to matrices in an 
obvious way. For example, if v, m, and a are arrays 
of ranks 1, 2, and 3, respectively, then: 


Vt.xM +> ((KtV)+.x(K,1rpM)+M)+(KıV)+.x(K,O)+M D1 
(I,J)rAr.xV ++ ((I,J,0)+A)+.xV D.2 


4.3 Summarization and Distribution 


Consider the definition and and use of the fol- 
lowing functions: 


N:€v4<\woe.=w)/w D.3 
S:(Nu)Jo.=w D.4 
A+3 3 1 4 1 
C+10 20 30 40 50 
NA S A (S A)+.xC 
3 1 4 11000 30 80 40 
0 O 1 0 1 
00010 


The function y selects from a vector argument 
its nub, that is, the set of distinct elements it con- 
tains. The expression s a gives a boolean 
“summarization matrix" which relates the ele- 
ments of 4 to the elements of its nub. If 4 is a vec- 
tor of account numbers and c is an associated vec- 
tor of costs, then the expression (s 4)+.xc evaluated 
above sums or "summarizes" the charges to the 
several account numbers occurring in a. 

Used as postmultiplier, in expressions of the 
form w+.xs a, the summarization matrix can be 
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used to distribute results. For example, if r is a 
function which is costly to evaluate and its argu- 
ment v has repeated elements, 1t may be more effi- 
cient to apply r only to the nub of v and distribute 
the results in the manner suggested by the follow- 
ing identity: 


F V ++ (FN V)+.xS V D.5 


The order of the elements of x v is the same as 
their order in v, and it is sometimes more conven- 
ient to use an ordered nub and corresponding 
ordered summarization given by: 


QON:Nwl[ dw) D.6 
OS:(ONw)o.=w D.7 


The identity corresponding to D.5 is: 


FV e> (F ON V)+.x@S V D.8 


The summarization function produces an inter- 
esting result when applied to the function z defined 
by A.2: 


+/St/T N «>» (0,1N)IN 


In words, the sums of the rows of the summariza- 
tion matrix of the column sums of the subset ma- 
trix of order » is the vector of binomial coefficients 
of order y. 


4.4 Distributivity 


The distributivity of one function over another 
is an important notion in mathematics, and we will 
now raise the question of representing this in a 
general way. Since multiplication distributes to 
the right over addition we have ax(b+q)++abtaq , and 
since it distributes to the left we have (a+p)xb++ab+pb. 
These lead to the more general cases: 


(a+p)x(b+q) ++ ab+raq+pb+pq 
(atp)x(b+q)x(ctr) +> abc+abr+aqc+aqr +pbc+ pbr +pqc+pqr 
(atp)x(btq)x...*(ctr) ++ ab...ct....+pq...r 


Using the notion that v+4,8 and w+P,Q or v+4,B,C 
and w+P,q,r, etc., the left side can be written sim- 
ply in terms of reduction as x/v+w. For this case of 
three elements, the right side can be written as the 
sum of the products over the columns of the fol- 
lowing matrix: 


VCO] VCO) vto)] VCO} WCO? WEO} WCOJ Wo] 
VC1} VCL) KWELI WC1] VE1) VC1] WC1] W[1] 
vt2]3 W[2) VE2) WE2) V[2] W[2]) V£2) wWl2) 
The pattern of v's and »'s above is precisely 
the pattern of zeros and ones in the matrix 7+Tpv, 
and so the products down the columns are given by 
(Vx.*~7)x(wx.*7), Consequently: 


x/ V+W +> +/(Vx.x-T)xWx.*«T+T pV D9 


We will now present a formal inductive proof of 
D.9, assuming as the induction hypothesis that D.9 
is true for all v and w of shape » (that is, 
s/N=(pv),o¥) and proving that it holds for shape w+:, 
that is, for x,v and r,», where x and y are arbitrary 
scalars. 

For use in the inductive proof we will first give 
a recursive definition of the function 7, equivalent 
to A.2 and based on the following notion: if m+7 2 18 
the result of order 2, then: 


M 
O O 1 1 
O 1 0 1 
0,[1]M 1,C.1]M 
00 0 0 1-1 1 1 
0O O 1 1 0 O 1 1 
O 1 0 1 O 1 0 1 
(0,C1JM),(1,(1)M) 
00001717171 
00110 01 1 
010101071 
Thus 
T:(0,[137),(1,[(1]T+*Tu-1):0=0:0 1p0 D.10 
+/((CeX,V)x.*>Q)xDx.*QéTp(D+Y,W) 
+/(Cx.*~Z,U)xDx.*(2+0,01)] T),U+1,[1) TeTpw D.10 
+/((Cx.x-Z),Cx.*»-U)x(Dx.*x2),Dx.*U Note 1 
+/( lt Cx.*=Z),Cx.*x-U)x(CY*O)xWx.x«T),(Yx1)xWx.*«T Note 2 
+/((Cx.x>Z),Cx.x-U)x(Wx,.»T),YxWx.«T Y*O 1+>1,Y 
+/L (XxVx.»"T),Vx.*»-T)x(Wx,*«T),YxWx.«T Note 2 
+/LXx(Vx.#“-T)xWx.«T),(Yx(Vx.*-T)xWx.«T) Note 3 


+/(Xxx/V+W),LYxx/V+W) 
+/(X,Y)xx/V+W 
x/(X+Y),(V+W) 
x/(X,V)+(Y,W) 


Induction hypothesis 
(XxS),(YxS)+>+(X,Y)xS 
Definition of x/ 

+ distributes over , 


Note 1: M+.xN,P +> (M+.xN),M+.xP (partitioning identity on matrices) 


Note 2: V+.xM +> ((14+V)+.x(1,1+pM)+M)+(1+V)+.x1 OFM 
(partitioning identity on matrices and the definition of C, D, Z, and U) 


Note 3: (V,W)xP,Q «> (VxP),WxQ 


To complete the inductive proof we must show 
that the putative identity D.9 holds for some value 
of v. If w-0, the vectors a and 3 are empty, and 
therefore x,a «+ ‚x and y,s +> ,y. Hence the left 
side becomes x/x+y, or simply x+y. The right side 
becomes +/(xx.*-9)xYx.*Q, Where ~g is the one- 
rowed matrix 1 o and q is 01. The right side is 
therefore equivalent to +/(x,1)*(1,Y), or x+y. Simi- 
lar examination of the case v-1 may be found in- 
structive. 


4.5 Newton's Symmetric Functions 


If x is a scalar and r is any vector, then x/x-r is 
a polynomial in x having the roots r. It is there- 
fore equivalent to some polynomial c z x, and as- 
sumption of this equivalence implies that c is a 
function of z. We will now use D.8 and D.9 to de- 
rive this function, which is commonly based on 


Newton 's symmetric functions: 


x/X-R 
x/X+(-R) 
+/(Xx.*x-T)x(-R)x.*«T+T oR D9 
(Xx.*-T)+.xP+(-R)x.»«T Def of +.x 
(XxS++/-T)+.xP Note 1 
((X*ON S)+.x0S S)+.xP D.8 
(X*QN S)+.x((QS S)+t.xP) +. is associative 
(X*0,1ph)+.x( (OS S)+.xP) Note 2 
((0S S)+.xP)P X B.1 (polynomial) 
((QS +/-T)+r.x((-R)x.xT+T7T pR))P X Defs of S 
and P 
Note 1: If X is a scalar and B is a boolean vector, then Xx.*B 


+> X*+/B. 


Note 2: Since T is boolean and has pR rows, the sums of its columns range from 0 
to p R, and their ordered nub is therefore 0,1pR. 


4.6 Dyadic Transpose 


The dyadic transpose, denoted by x, is a general- 
ization of monadic transpose which permutes axes 
of the right argument, and (or) forms "sectors" of 
the right argument by coalescing certain axes, all 
as determined by the left argument. We introduce 
it here as a convenient tool for treating properties 
of the inner product. 

The dyadic transpose will be defined formally 
in terms of the selection function 


SF:(,w)[1+(lpw)1a-1] 


which selects from its right argument the element 
whose indices are given by its vector left argument, 
the shape of which must clearly equal the rank of 
the right argument. The rank of the result of xea 
is r/x, and if 7 is any suitable left argument of the 
selection 7 sr xaa then: 


ISPKRA+>(I[KJI)SFA D.11 


For example, if m is a matrix, then 2 1 am «+ amand 
1 1 amis the diagonal of m; if 7 is a rank three array, 
then ı 2 2 ar is a matrix "diagonal section” of 7 
produced by running together the last two axes, 
and the vector 1 1 1 ar is the principal body diago- 
nal of 7. 

The following identity will be used in the se- 
quel: 


JRKRA +> (J[K])XA D.12 


Proof: 


I SF JNKNA 
(IfJ]) SF KRA 
(CILJI)ICK)) SF A 
(IL(JCK)))) SF A 
I SF(J[K])XA 


Definition of Y (D.11) 
Definition of Y 
Indexing is associative 
Definition of Y 


4.7 Inner Products 

The following proofs are stated only for matrix 
arguments and for the particular inner product 
+.x. They are easily extended to arrays of higher 
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rank and to other inner products r.c, where r and c 
need possess only the properties assumed in the 
proofs for + and «. 

The following identity (familiar in mathemat- 
ics as a sum over the matrices formed by (outer) 
products of columns of the first argument with 
corresponding rows of the second argument) will be 
used in establishing the associativity and distrib- 
utivity of the inner product: 


M+.xN +> +/1 3 3 2 Q Mo.xN D.13 


Proof: (1,/)5F m+.xn is defined as the sum over y, 
where v(x} ++ mtr;kıxnctk;g). Similarly, 


(I,J)SF +/1 3 3 2 Q Mo.xN 
is the sum over the vector » such that 


WOK) +> (I,d,K)SF 1 3 3 2 Q Mo,xN 


Thus: 

WCK] 

(I,J,K)SF 1 3 3 2 QMo,xN 

(I,J,K)[1 3 3 2)SF Mo.xN D.12 


(I,K,K,J)SF Mo.xÑN 
MUI;K)xNCK;J] 
VCK) 


Def of indexing 
Def of Outer product 


Matrix product distributes over addition as 
follows: 


M+.x( N+P) <> (M+.xN)+(M+.xP) D.14 
Proof: 
M+.x( N+P) 
+/(J+ 1 3 3 2)QMo0.xN+P D.13 


+/JQ(Mo.xN)+(Mo.xP) 
+/(JRMo.xN)+(JRMo.xP) & distributes over + 
(+/JRMo.xN)+(+/JRMe.xP) + is assoc and comm 
(M+.xN)+(M+.xP) D.13 


x distributes over + 


Matrix product is associative as follows: 


M+.x(N+.xP) ++ (M+.xN)+.xP D.15 


Proof: We first reduce each of the sides to sums 
over sections of an outer product, and then com- 
pare the sums. Annotation of the second reduction 
is left to the reader: 


M+.x+/1 3 3 2QNo.xP D.12 
+/1 3 3 2QM0.x+/1 3 3 20No.xP D.12 
+/1 3 3 289+/M0.x1 3 3 2%94No.xP x distributes over + 
+/1 3 3 2&+/1 2 3 5 5 YQMo,xNo,xP Note 1 
+/+/1 33 2 4 &1 2 3 5 5 4QMo,xNo,xP Note 2 
+/+/1 3 3 4 4 2QM0o,xNo,xP D.12 
+/+/1 3 3 4 4 28(Mo.xN)o.xP x is associative 
+/+/1 4 4 3 3 29(Mo.xN)o.xP + is associative and 


commutative 
(M+.xN)+.xP 
(+/1 3 3 20Mo.xN)+.xP 
+/1 3 3 28 +/1 3 3 28Mo.xN)o.xP 
+/1 3 3 28+/1 5 5 2 3 yQ(Mo.xN)o,xP 
+/+/1 33 2 491 5 5 2 3 YR(Mo.xN)o.xP 
+/+/1 4 4 3 3 28(Mo.xN)o.xP 


Note 1: +/Mo .xJRA +>+/((1ppM),J+ppMIRMOo.xA 


Note 2: JQ+/A +> +/(J,1+T/J)RA 
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4.8 Product of Polynomials 
The identity B.2 used for the multiplication of 
polynomials will now be developed formally: 


(B P X)x(C PX) 


(+/BxX*E+ 1+1pB)x(+/CxXx*Fe 1+19C) B.1 
+/+/(BxXx*E)o.x(CxXx*PF) Note 1 
+/+/(Bo.xC)x((Xx*E)o.x(X*F)) Note 2 
+/+/(Bo.xC)x(X*(E0.,+F)) Note 3 


Note 1: (+/V)x(+/W)+++/+/Vo,.xX because x distributes over +and + is 
associative and commutative, or see [12,P21] for a proof. 


Note 2: The equivalence of (PxV)°.x(QxW) and (Pe.xQ)x(VoxW) can be 
established by examining a typical element of each expression. 


Note 3: (Xx*I)x(Xx*J])*>X*(I+J]) 


The foregoing is the proof presented, in abbre- 
viated form, by Orth [ 13, p.52], who also defines 
functions for the composition of polynomials. 


4.9 Derivative of a Polynomial 

Because of their ability to approximate a host 
of useful functions, and because they are closed 
under addition, multiplication, composition, differ- 
entiation, and integration, polynomial functions 
are very attractive for use in introducing the study 
of calculus. Their treatment in elementary calcu- 
lus is, however, normally delayed because the de- 
rivative of a polynomial is approached indirectly, 
as indicated in Section 2, through a sequence of 
more general results. 

The following presents a derivation of the de- 
rivative of a polynomial directly from the expres- 
sion for the slope of the secant line through the 
points x, z x and (x+Y),F(x+Y): 


((C P X+Y)-(C P X))#Y 

((C P X+Y)-(C P X+0))#Y 

( (C P X+Y)-((OxJ)+.x(A+DS Jo.!J+ 1+1pC)+.xC) P X)#Y B6 
((((Y*J)+. xM) P X)-((0*J)+.xM+A+.xC) P X)+Y B.6 
((((Y*J)+t.xM)-(O*J)+.xM) PX )+Y P dist over - 
(CCC Y*d )-O*«d )+.xM) P X)+Y +.x dist over - 
(((0,Y*«1J)+.xM) P X)+Y Note 1 
(((Y*1+J)+.x 1 0 +M) P X)+Y D.1 
(((Y*1+J)+.x(1 0 O +A)+t.xC) P X)+Y D.2 
(CY*1+J-1)+.x(1 0 0 +A)+t.xC) PX (Y*A)+tYes>YxA-1 
((Y* 1+1 1+pC)+.x(1 0 O +A)t.xC) PX Def of J 
((CY* > 141° 1+pC)+.x 1 0 0 +A)t.xC) PX D.15 
Note 1: O*0*>1+>Yx*0 and A/0=0*14+J 


The derivative is the limiting value of the se- 
cant slope for y at zero, and the last expression 
above can be evaluated for this case because if 
E+"1+171+pc 18 the vector of exponents of y, then all 
elements of z are non-negative. Moreover, o+£ re- 
duces to a 1 followed by zeros, and the inner prod- 
uct with 1 o o+a therefore reduces to the first plane 
of 1 o osa or, equivalently, the second plane of a. 

If 3=J0.:J+"1+1pc is the matrix of binomial coef- 
ficients, then a is os s and, from the definition of ps 
in B.5, the second plane of A is 8x1=-J0.-J, that is, 
the matrix s with all but the first super-diagonal 
replaced by zeros. The final expression for the 


coefficients of the polynomial which is the deriva- 
tive of the polynomial c z « is therefore: 


((Jo.1J)xt=-Jo.-Je 1+1pC)+.xC 
For example: 


C+ 5711 13 
(Jo.!J)x1=-Jo. -Je 1+1pC 


oOoO00 
oo OF 


Je.!J)x1=2-Je-J+ 1+19C)+.xC 
0 


Since the superdiagonal of the binomial coeffi- 
cient matrix (iv)°.:iw IS (T1+iM-1):118-1, Or simply 
ın-ı, the final result is 19cx"1+1pc In agreement 
with the earlier derivation. 

In concluding the discussion of proofs, we will 
re-emphasize the fact that all of the statements in 
the foregoing proofs are executable, and that a 
computer can therefore be used to identify errors. 
For example, using the canonical function defini- 
tion mode [4 , p.81], one could define a function 
whose statements are the first four statements of 
the preceding proof as follows: 


VF 
[1] ((C P X+Y)-(C P X))+Y 
[2] ((C P X+Y)-(C P X+0))+Y 
C3J ((C P X+Y)-((0*J)+.x(A+DS Jo. !J+ 1+1pC)+.xC) P X)+Y 
[ul ((((Y*J)+.xM) P X)-((O*J)+.xM+A+.xC) P X)+Y 
V 


The statements of the proof may then be executed 
by assigning values to the variables and executing r 
as follows: 


C+5 2 3 1 

DES 

Xx +3 X+ı10 

F F 
132 66 96 132 174 222 276 336 402 474 552 
132 66 96 132 174 222 276 336 402 474 552 
132 66 96 132 174 222 276 336 402 474 552 
132 66 96 132 174 222 276 336 402 474 552 


The annotations may also be added as comments 
between the lines without affecting the execution. 


5. Conclusion 


The preceding sections have attempted to devel- 
op the thesis that the properties of executability 
and universality associated with programming lan- 
guages can be combined, in a single language, with 
the well-known properties of mathematical nota- 
tion which make it such an effective tool of 
thought. This is an important question which 
should receive further attention, regardless of the 
success or failure of this attempt to develop it in 
terms of APL. 

In particular, I would hope that others would 
treat the same question using other programming 


languages and conventional mathematical notation. 
If these treatments addressed a common set of top- 
ics, such as those addressed here, some objective 
comparisons of languages could be made. Treat- 
ments of some of the topics covered here are al- 
ready available for comparison. For example, Ker- 
ner [7] expresses the algorithm C.3 in both AL- 
GOL and conventional mathematical notation. 

This concluding section is more general, con- 
cerning comparisons with mathematical notation, 
the problems of introducing notation, extensions to 
APL which would further enhance its utility, and 
discussion of the mode of presentation of the earli- 
er sections. 


5.1 Comparison with Conventional Mathe- 
matical Notation 

Any deficiency remarked in mathematical nota- 
tion can probably be countered by an example of 
its rectification in some particular branch of math- 
ematics or in some particular publication; compar- 
isons made here are meant to refer to the more 
general and commonplace use of mathematical 
notation. 

APL is similar to conventional mathematical 
notation in many important respects: in the use of 
functions with explicit arguments and explicit re- 
sults, in the concomitant use of composite expres- 
sions which apply functions to the results of other 
functions, in the provision of graphic symbols for 
the more commonly used functions, in the use of 
vectors, matrices, and higher-rank arrays, and in 
the use of operators which, like the derivative and 
the convolution operators of mathematics, apply to 
functions to produce functions. 

In the treatment of functions APL differs in 
providing a precise formal mechanism for the defi- 
nition of new functions. The direct definition 
form used in this paper is perhaps most appropriate 
for purposes of exposition and analysis, but the 
canonical form referred to in the introduction, and 
defined in [4, p.81], is often more convenient for 
other purposes. 

In the interpretation of composite expressions 
APL agrees in the use of parentheses, but differs in 
eschewing hierarchy so as to treat all functions 
(user-defined as well as primitive) alike, and in 
adopting a single rule for the application of both 
monadic and dyadic functions: the right argument 
of a function is the value of the entire expression 
to its right. An important consequence of this rule 
is that any portion of an expression which is free of 
parentheses may be read analytically from left to 
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Fig. 3. 


n 
gr 


j=! 


1°2°3 + AS 1,7 + ... 


12234 + 2°3°4°5 +... 


right (since the leading function at any stage is the 
"outer" or overall function to be applied to the 
result on its right), and constructively from right 
to left (since the rule is easily seen to be equiva- 
lent to the rule that execution is carried out from 
right to left). 

Although Cajori does not even mention rules 
for the order of execution in his two-volume histo- 
ry of mathematical notations, it seems reasonable 
to assume that the motivation for the familiar 
hierarchy (power before x and » before + or -) arose 
from a desire to make polynomials expressible 
without parentheses. The convenient use of vec- 
tors in expressing polynomials, as in +/c«x+z, does 
much to remove this motivation. Moreover, the 
rule adopted in APL also makes Horner 's efficient 
expression for a polynomial expressible without 
parentheses: 


t/3 4 2 5xX*0 1 2 3 -> 3+Xx4+xXx2+Xx5 


In providing graphic symbols for commonly 
used functions APL goes much farther, and pro- 
vides symbols for functions (such as the power 
function) which are implicitly denied symbols in 
mathematics. This becomes important when oper- 
ators are introduced; in the preceding sections the 
inner product «.+ (which must employ a symbol for 
power) played an equal role with the ordinary in- 
ner product +.x. Prohibition of elision of function 
symbols (such as x) makes possible the unambi- 
gious use of multi-character names for variables 
and functions. 

In the use of arrays APL is similar to mathe- 
matical notation, but more systematic. For exam- 
ple, v+w has the same meaning in both, and in APL 
the definitions for other functions are extended in 
the same element-by-element manner. In mathe- 
matics, however, expressions such as vxw and v+w 
are defined differently or not at all. 
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n terms + > ann + 1) (n + 2) (n + 3) 


-nlterms => ¿nta + 1) (n + 2) (n + 3) (n + 4) 


For example, vxw commonly denotes the vector 
product [14, p.308]. It can be expressed in vari- 
ous ways in APL. The definition 


VP:((1%a)x 104)-( 1da)x1dw 


provides a convenient basis for an obvious proof 
that ve is "anticommutative' (that is, 
vvPw++ wvev), and (using the fact that 
“16x ++ 24x for 3-element vectors) for a simple 
proof that in 3-space v and w are both orthogonal to 
their vector product, that is, +/0=v+.x*v ve w and 
a/O=W+t.xV VP Y. 

APL is also more systematic in the use of oper- 
ators to produce functions on arrays: reduction 
provides the equivalent of the sigma and pi nota- 
tion (in +, and «/) and a host of similar useful cas- 
es; outer product extends the outer product of ten- 
sor anaysis to functions other than «, and inner 
product extends ordinary matrix product (+.x) to 
many cases, such as v.a and ı.+, for which ad hoc 
definitions are often made. 

The similarities between APL and conventional 
notation become more apparent when one learns a 
few rather mechanical substitutions, and the trans- 
lation of mathematical expressions is instructive. 
For example, in an expression such as the first 
shown in Figure 3, one simply substitutes iw for 
each occurrence of j and replaces the sigma by +7. 
Thus: 


t+/(iN)x2*-1N „ OT +/7x2*-Jern 


Collections such as Jolley's Summation of 
Series [15] provide interesting expressions for 
such an exercise, particularly if a computer is 
available for execution of the results. For example, 
on pages 8 and 9 we have the identities shown in 
the second and third examples of Figure 3. These 
would be written as: 


+/x/("1+1N)0.+13 +> (x/N+0,13)+4 
+/x/("1+r1N)o.+18 e> (x/N+0,14)+5 
Together these suggest the following identity: 


+4/x/(7"1+1N)o.+1K +> (x/N+0,1K)+K+1 


The reader might attempt to restate this general 
identity (or even the special case where x-0) in 
Jolley 's notation. 

The last expression of Figure 3 is taken from a 
treatment of the fractional calculus [16, p.30], 
and represents an approximation to the qth order 
derivative of a function f. It would be written as: 


(Sx-Q)x+/lJ!J-1+Q)xF X-(Je 1rıN)xS+(X-A)+N 


The translation to APL is a simple use of 1 as 
suggested above, combined with a straightforward 
identity which collapses the several occurrences of 
the gamma function into a single use of the bino- 
mial coefficient function :, whose domain is, of 
course, not restricted to integers. 

In the foregoing, the parameter 9 specifies the 
order of the derivative if positive, and the order of 


the integral (from 4 to x) if negative. Fractional 
values give fractional derivatives and integrals, and 
the following function can, by first defining a func- 
tion r and assigning suitable values to » and a, be 
used to experiment numerically with the deriva- 
tives discussed in [16]: 


OS:(S*-a)xt/(did-1t+a)xFu-( d+ 1+1N)xS+(w-A)J+N 


Although much use is made of "formal" manip- 
ulation in mathematical notation, truly formal 
manipulation by explicit algorithms is very diffi- 
cult. APL is much more tractable in this respect. 
In Section 2 we saw, for example, that the deriva- 
tive of the polynomial expression (ue.*"1+1pa)r.xa 
is given by (we.+*71+19a)+.x1bax-1+19a, and a set of 
functions for the formal differentiation of APL 
expressions given by Orth in his treatment of the 
calculus [13] occupies less than a page. Other 
examples of functions for formal manipulation 
occur in [17, p.347] in the modeling operators for 
the vector calculus. 

Further discussion of the relationship with 
mathematical notation may be found in [3] and 
in the paper "Algebra as a Language” [6, p.325 ]. 

A final comment on printing, which has always 
been a serious problem in conventional notation. 
Although APL does employ certain symbols not 
yet generally available to publishers, it employs 
only 88 basic characters, plus some composite char- 
acters formed by superposition of pairs of basic 


characters. Moreover, it makes no demands such as 
the inferior and superior lines and smaller type 
fonts used in subscripts and superscripts. 


5.2 The Introduction of Notation 

At the outset, the ease of introducing notation 
in context was suggested as a measure of suitability 
of the notation, and the reader was asked to ob- 
serve the process of introducing APL. The utility 
of this measure may well be accepted as a truism, 
but it is one which requires some clarification. 

For one thing, an ad hoc notation which provid- 
ed exactly the functions needed for some particular 
topic would be easy to introduce in context. It is 
necessary to ask further questions concerning the 
total bulk of notation required, the degree of struc- 
ture in the notation, and the degree to which nota- 
tion introduced for a specific purpose proves more 
generally useful. 

Secondly, it is important to distinguish the dif- 
ficulty of describing and of learning a piece of no- 
tation from the difficulty of mastering its implica- 
tions. For example, learning the rules for comput- 
ing a matrix product is easy, but a mastery of its 
implications (such as its associativity, its distrib- 
utivity over addition, and its ability to represent 


linear functions and geometric operations) is a 
different and much more difficult matter. 

Indeed, the very suggestiveness of a notation 
may make it seem harder to learn because of the 
many properties it suggests for exploration. For 
example, the notation +.x for matrix product can- 
not make the rules for its computation more diffi- 
cult to learn, since it at least serves as a reminder 
that the process is an addition of products, but any 
discussion of the properties of matrix product in 
terms of this notation cannot help but suggest a 
host of questions such as: Is v.» associative? Over 
what does it distribute? Is Bv.ac «+ a(&C)v.^ðB a 
valid identity? 


5.3 Extensions to APL 

In order to ensure that the notation used in this 
paper is well-defined and widely available on exist- 
ing computer systems, it has been restricted to 
current APL as defined in [4] and in the more 
formal standard published by STAPL, the ACM 
SIGPLAN Technical Committee on APL 
(17, p.409]. We will now comment briefly on 
potential extensions which would increase its con- 
venience for the topics treated here, and enhance 
its suitability for the treatment of other topics 
such as ordinary and vector calculus. 
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One type of extension has already been suggest- 
ed by showing the execution of an example (roots 
of a polynomial) on an APL system based on com- 
plex numbers. This implies no change in function 
symbols, although the domain of certain functions 
will have to be extended. For example, ¡x will give 
the magnitude of complex as well as real argu- 
ments, +x will give the conjugate of complex argu- 
ments as well as the trivial result it now gives for 
real arguments, and the elementary functions will 
be appropriately extended, as suggested by the use 
of « in the cited example. It also implies the possi- 
bility of meaningful inclusion of primitive func- 
tions for zeros of polynomials and for eigenvalues 
and eigenvectors of matrices. 

A second type also suggested by the earlier sec- 
tions includes functions defined for particular pur- 
poses which show promise of general utility. Ex- 
amples include the nub function vw, defined by D.3, 


and the summarization function s, defined by D.4. 
These and other extensions are discussed in [18]. 
McDonnell [19, p.240] has proposed generaliza- 
tions of and and or to non-booleans so that avs is 
the GCD of a and 3, and ass is the LCM. The func- 
tions ccp and ¿cm defined in Section 3 could then be 
defined simply by cco:v/u and 1cm:1/0u. 

A more general line of development concerns 
operators, illustrated in the preceding sections by 
the reduction, inner-product, and outer-product. 
Discussions of operators now in APL may be found 


in [20] and in [17, p.129], proposed new opera- 
tors for the vector calculus are discussed in 
[17, p.47], and others are discussed in [18] and 
in [17, p.129). 


5.4 Mode of Presentation 


The treatment in the preceding sections con- 
cerned a set of brief topics, with an emphasis on 
clarity rather than efficiency in the resulting al- 
gorithms. Both of these points merit further com- 
ment. 

The treatment of some more complete topic, of 
an extent sufficient for, say, a one- or two-term 
course, provides a somewhat different, and perhaps 
more realistic, test of a notation. In particular, it 
provides a better measure of the amount of nota- 
tion to be introduced in normal course work. 

Such treatments of a number of topics in APL 
are available, including: high school algebra [6], 
elementary analysis [5], calculus, [13], design of 
digital systems [21], resistive circuits [10], and 
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crystallography [22]. All of these provide indica- 
tions of the ease of introducing the notation need- 
ed, and one provides comments on experience in its 
use. Professor Blaauw, in discussing the design of 
digital systems [21], says that "APL makes it 
possible to describe what really occurs in a complex 
system", that "APL is particularly suited to this 
purpose, since it allows expression at the high ar- 
chitectural level, at the lowest implementation 
level, and at all levels between", and that 
"learning the language pays of (sic) in- and out- 
side the field of computer design". 

Users of computers and programming languages 
are often concerned primarily with the efficiency 
of execution of algorithms, and might, therefore, 
summarily dismiss many of the algorithms pres- 
ented here. Such dismissal would be short-sighted, 
since a clear statement of an algorithm can usually 
be used as a basis from which one may easily de- 
rive more efficient algorithms. For example, in 
the function srzp of section 3.2, one may signifi- 
cantly increase efficiency by making substitutions 
of the form sam for <@m)+.xs, and in expressions 
using +/Cxx*"1+1pc one may substitute x.¢c or, 
adopting an opposite convention for the order of 
the coefficients, the expression x.c. 

More complex transformations may also be 
made. For example, Kerner's method (C.3) re- 
sults from a rather obvious, though not formally 
stated, identity. Similarly, the use of the matrix a 
to represent permutations in the recursive function 
r used in obtaining the depth first spanning tree 
(C.4) can be replaced by the possibly more compact 
use of a list of nodes, substituting indexing for in- 
ner products in a rather obvious, though not com- 
pletely formal, way. Moreover, such a recursive 
definition can be transformed into more efficient 
non-recursive forms. 

Finally, any algorithm expressed clearly in 
terms of arrays can be transformed by simple, 
though tedious, modifications into perhaps more 
efficient algorithms employing iteration on scalar 
elements. For example, the evaluation of +,x de- 
pends upon every element of x and does not admit 
of much improvement, but evaluation of {v/s could 
stop at the first element equal to 1, and might 
therefore be improved by an iterative algorithm 
expressed in terms of indexing. 

The practice of first developing a clear and pre- 
cise definition of a process without regard to effi- 
ciency, and then using it as a guide and a test in 
exploring equivalent processes possessing other 
characteristics, such as greater efficiency, is very 
common in mathematics. It is a very fruitful prac- 


tice which should not be blighted by premature 
emphasis on efficiency in computer execution. 

Measures of efficiency are often unrealistic be- 
cause they concern counts of "substantive" func- 
tions such as multiplication and addition, and ig- 
nore the housekeeping (indexing and other selec- 
tion processes) which is often greatly increased by 
less straightforward algorithms. Moreover, realis- 
tic measures depend strongly on the current design 
of computers and of language embodiments. For 
example, because functions on booleans (such as »/8 
and v/s) are found to be heavily used in APL, im- 
plementers have provided efficient execution of 
them. Finally, overemphasis of efficiency leads to 
an unfortunate circularity in design: for reasons of 
efficiency early programming languages reflected 
the characteristics of the early computers, and 
each generation of computers reflects the needs of 
the programming languages of the preceding gener- 
ation. 
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Appendix A. Summary of Notation 


Fu SCALAR FUNCTIONS aFw 
w Conjugate + Plus 
O-w Negative - Minus 
(w>0)-w<0 Signum x Times 
1w Reciprocal + Divide 
wf -w Magnitude | Residue w-axwwtata=0 
Integer part Floor P Minimum (wxw<a)taxw2a 
- -W Ceiling [ Maximum -(-a)--w 
2.71828... *w Exponential * Power x/wpa 
Inverse of * Natural log © Logarithm (Bu )+00 
x/1+10w Factorial ! Binomial (tw)t(ta)x!w-a 
3.14159...%w Pi times O 
Boolean: Y ~ ~ (and, or, not-and, not-or, not) 
Relations: < s = 2 > æ (aRw is 1 if relation AR holds). 
Sec. V+>2 3 5 M+>+1 2 3 
Ref. 4 5 6 
Integers 1 15471 2 3 4 5 
Shape 1 pV+>3 pM+>2 3 2 3p16+7M 2p4e>4 4 
Catenation 1 V,Ve>2 35235 M,M*>1 2 3 1 2 3 
4 5 6 4 5 6 
Ravel 1 ¿»Me>1 23 4 5 6 
Indexing 1 VES 19-5 2 M[2;,2]+>5 M[2;]e«>4 5 6 
Compress 3 1 0 1/V+>2 5 0 14Me>4 5 6 
Take,Drop 1 2+V++2 3 "24V+>1+V+>+3 5 
Reversal 1 oV++5 3 2 
Rotate 1 20V++5 2 3 ~2oV+*3 5 2 
Transpose 1,4 Qw reverses axes a&w permutes axes 
Grade 3 43 2 6 2442 4 1 3 ¥3 2 6 2+>3 1 2 4 
Base value 1 LOL Vee23:5 ViVe>50 
& inverse 1 10 10 101T235+>+2 3 5 Vr50++2 3 5 
Membership 3 Ve3+>0 1 0 Ve5 2+>1 0 1 
Inverse 25 Bu is matrix inverse aBw+>( Bw )+.xa 
Reduction 1 +/V++10 +/M+>6 15 +/M++5 7 9 
Scan 1 +\V+>2 5 10 +AM*e>2 301 36 4 9 15 
Inner prod 1 +. is matrix product 
Outer prod 1 0 30,+1 2 3+>M 
Axis 1 FCI applies F along axis I 


Appendix B. Compiler from Direct to Can- 
onical Form 

This compiler has been adapted from [ 22, p.222]. 
It will not handle definitions which include a or : 
or „ in quotes. It consists of the functions rrx and 
rs, and the character matrices cs and as: 


FIX 
OpOFX FS M 


D+F9 E;F;I;K 

Fe(,(E="w')0.*541)/,E,(04,pE)p' Y9 ! 

F+(,(F='a' )°.4541)/,F,(O4,pF)p' X9 ' 

F+ei+pD+(0,+/ 6,17 )+(-(3xI)++XI+":'"=F)ÓF,($6,pF)p' ' 
D«*34C9[1+(1+'a'eE),1,0;]3,89D[;31,(I+2+1F),2]) 
K+K+2xK<1$K+InKe(>f1 00'*D'0.=E)/Ke+X-I+EEA9 


F+*(0,1+pE)[pD+*D,(F,pE)+R0 24+KO' ',E,[1.5)';' 
D-(F+4D),[1]F[2] 'a',E 
C9 Ag 
Z9+ 012345678 
Y9Z9+ SABCDEFGH 
Y9Z29+X9 IJKLMNOPQ 
)/3>(0=1#, RSTUVWXYZ 
>0,0pZ9+ ABCDEFGHI 
JKLMNOPQR 
STUVWXYZD 
Example: 
FIX 


FIB:Z,+/ 2+Z2+FIBw-1:w=1:1 


FIB 15 
1 1 2 35 8 13 21 34 55 89 144 233 377 610 


DCR'FIB' 
Z9+FIB Y9;Z 
>(0=1+,Y9=1)/3 
>0,0pZ9+1 
29+2,+/ 2+Z+FIB Y9-1 
AFIB:2,+/ 242+FIBuwu-1:w=1:1 
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THE INDUCTIVE METHOD OF INTRODUCING APL 


Kenneth E. Iverson 
I.P. Sharp Associates 
Toronto, Ontario 


Because APL is a language, there are, in the teaching of it, many analogies with the 
teaching of natural languages. Because APL is a formal language, there are also many 
differences, yet the analogies prove useful in suggesting appropriate objectives and 
techniques in teaching APL. 


For example, adults learning a language already know a native language, and the 
initial objective is to learn to translate a narrow range of thoughts (concerning 
immediate needs such as the ordering of food) from the native language in which they 
are conceived, into the target language being learned. Attention is therefore directed 
to imparting effective use of a small number of words and constructs, and not to the 
memorization of a large vocabulary. Similarly, a student of APL normally knows the 
terminology and procedures of some area of potential application of computers, and 
the inital objective should be to learn enough to translate these procedures into APL. 
Obvious as this may seem, introductory courses in APL (and in other programming 
languages as well) often lack such a focus, and concentrate instead on exposing the 
student to as much of the vocabulary (1.e., the primitive functions) of APL as possible. 


This paper treats some of the lessons to be drawn from analogies with the teaching 
of natural languages (with emphasis on the inductive method of teaching), examines 
details of their application in the development of a three-day introductory course in 
APL, and reports some results of use of the course. Implications for more advanced 
courses are also discussed briefly. 


1. The Inductive Method 


Grammars present general rules, such as for the conjugation of verbs, which the student 
learns to apply (by deduction) to particular cases as the need arises. This form of 
presentation contrasts sharply with the way the mother tongue is learned from repeated 
use of particular instances, and from the more or less conscious formulation (by 
induction) of rules which summarize the particular cases. | 


The inductive method is now widely used in the teaching of natural languages. One 
of the better-known methods is that pioneered by Berlitz [1] and now known as the 
“direct” method. A concise and readable presentation and analysis of the direct method 
may be found in Diller [2]. 


A class in the purely inductive mode is conducted entirely in the target language, with 
no use of the students mother tongue. Expressions are first learned by imitation, and 
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concepts are imparted by such devices as pointing, pictures, and pantomime; students 
answer questions, learn to ask questions, and experiment with their own statements, 
all with constant and immediate reaction from the teacher in the form of correction, 
drill, and praise, expressed, of course, in the target language. 


In the analogous conduct of an APL course, each student (or, preferably, each student 
pair) is provided with an APL terminal, and with a series of printed sessions which 
give explicit expressions to be “imitated” by entering them on the terminal, which 
suggest ideas for experimentation, and which pose problems for which the student must 
formulate and enter appropriate expressions. Part of such a session is shown as an 
example in Figure 1. 


SESSION 1: NAMES AND EXPRESSIONS 


The left side of each page provides examples to be entered on the keyboard, and the 
right side provides comments on them. Each expression entered must be followed by 
striking the RETURN key to signal the APL system to execute the expression. 


AREA+8x2 The name AREA is assigned to the result 

HEIGHTS of the multiplication, that is 16 

VOLUME*+HEIGHTxAREA 

HEIGHTxAREA If no name is assigned to the result, it 
is printed 

VOLUME 


3x8x?2 


LENGTH<8 7 6 5 Names may be assigned to lists 

WIDTH+2 3 4 5 

LENGTHxWIDTH 

24 25 

PERIMETER+2x(LENGTH+WIDTH) Parentheses specify the order in which 

PERIMETER parts of an expression are to be 
20: 20: 2020 executed 

112x112x112 Decimal numbers may be used 
1.404928 

1.1243 Yield of 12 percent for 3 years 
1.404928 


SAMPLE PORTION OF SESSION 


Figure 1 


Because APL is a formal “imperative” language, the APL system can execute any 
expression entered on the terminal, and therefore provides most of the reaction required 
from a teacher. The role of the instructor is therefore reduced to that of tutor, providing 
explicit help in the event of severe difficulties (such as failure of the terminal), and 
general discussion as required. As compared to the case of a natural language, the 
student is expected, and is better able, to assess his own performance. 
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Applied to natural languages, the inductive method offers a number of important 
advantages: 


1. Many dull but essential details (such as pronunciation) required at the outset are 
acquired in the course of doing more interesting things, and without explicit drill 
in them. 


2. The fun of constantly looking for the patterns or rules into which examples can 
be fitted provides a stimulation lacking in the explicit memorization of rules, and 
the repeated examples provide, as always, the best mnemonic basis for 
remembering general rules. 


3. The experience of committing error after error, seeing that they produce no lasting 
harm, and seeing them corrected through conversation, gives the student a 
confidence and a willingness to try that is difficult to impart by more formal 
methods. 


4. The teacher need not be expert in two languages, but only in the target language. 
Analogous advantages are found in the teaching of APL: 


1. Details of the terminal keyboard are absorbed gradually while doing interesting 
things from the very outset. 


2. Most of the syntactic rules, and the extension of functions to arrays, can be quickly 
gleaned from examples such as those presented in Figure 1. 


3. The student soon sees that most errors are harmless, that the nature of most are 
obvious from the simple error messages, and that any adverse effects (such as an 
open quote) are easily rectified by consulting a manual or a tutor. 


4. The tutor need only know APL, and does not need to be expert in areas such 
as financial management or engineering to which students wish to apply APL, 
and need not be experienced in lecturing. 


2. The Use Of Reference Material 


In the pure use of the inductive method, the use of reference material such as grammars 
and dictionaries would be forbidden. Indeed, their use is sometimes discouraged because 
the conscious application of grammatical rules and the conscious pronunciation of words 
from visualization of their spellings promotes uneven delivery. However, if a student 
is to become independent and capable of further study on his own, he must be 
introduced to appropriate reference material. 


Effective use of reference material requires some practice, and the student should 
therefore be introduced to it early. Moreover, he should not be confined to a single 
reference; at the outset, a comprehensive dictionary is too awkward and confusing, but 
a concise dictionary will soon be found to be too limited. 


In the analogous case of APL, the role of both grammar and dictionary is played by 
the reference manual. A concise manual limited to the core language [3] should be 
supplemented by a more conprehensive manual (such as Berry [4]) which covers all 
aspects of the particular system in use. Moreover, the student should be led immediately 
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to locate the two or three main summary tables in the manual, and should be prodded 
into constant use of the manual by explicit questions (such as “what is the name of 
the function denoted by the comma”), and by glimpses of interesting functions. 


3. Order Of Presentation 


Because the student is constantly striving to impose a structure upon the examples 
presented to him, the order of presentation of concepts is crucial, and must be carefully 
planned. For example, use of the present tense should be well established before other 
tenses and moods are introduced. The care taken with the order of presentation should, 
however, be unobtrusive, and the student may become aware of it only after gaining 
experience beyond the course, if at all. 


We will address two particular difficulties with the order of presentation, and exemplify 
their solutions in the context of APL. The first is that certain expressions are too 
complex to be treated properly in detail at the point where they are first useful. These 
can be handled as “useful expressions” and will be discussed in a separate section. 


The second difficulty is that certain important notions are rendered complex by the 
many guises in which they appear. The general approach to such problems is to present 
the essential notion early, and return to it again and again at intervals to reinforce 
it and to add the treatment of further aspects. 


For example, because students often find difficulty with the notion of literals (i.e., 
character arrays), its treatment in APL is often deferred, even though this deferral also 
makes it necessary to defer important practical notions such as the production of 
reports. In the present approach, the essential notion is introduced early, in the manner 
shown in Figure 2. Literals are then returned to in several contexts: in the 
representation of function definitions; in discussion of literal digits and the functions 
(+ and 2) which are used to transform between them and numbers in the production 
of reports; and in their use with indexing to produce barcharts. 


Function definition is another important idea whose treatment is often deferred because 
of its seeming complexity. However, this complexity inheres not in the notion itself, 
but in the mechanics of the general del form of definition usually employed. This 
complexity includes a new mode of keyboard entry with its own set of error messages, 
a set of rules for function headers, confusion due to side-effects resulting from failure 
to localize names used or to definitions which print results but have no explicit results, 
and the matter of suspended functions. 


All of this is avoided by representing each function definition by a character vector in 
the direct form of definition [5 6]. For example, a student first uses the function 
ROUND provided in a workspace, then shows its definition, and then defines an 
equivalent function called R as follows: 


ROUND: 24.98. 31:15. 28.59 
25.231: 29 


SHOW *ROUND' 
ROUND: L .5+W 
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SESSION 4: LITERALS 


JANET+5 Janet received 5 letters today 

MARY+8 

MARY| JANET The maximum received by one of them 
MARYLJANET The minimum 
> MARY>JANET Mary received more than Janet 
i MARY=JANET They did not receive an equal number 
O 


What sense can you make of the following sentences: 

JANET has 5 letters and MARY has 8 

JANET has 5 letters and MARY has 4 

'JANET' has 5 letters and 'MARY' has 4 
The last sentence above uses quotation marks in the usual way to make a literal 
reference to the (letters in the) name itself as opposed to what it denotes. The second 


points up the potential ambiguity which is resolved by quote marks. 


LISTEZ2b:6:3 17 


oLIST 
3 

WORD<-'LIST' 

pWORD 

y 

SENTENCE*+* LIST THE NET GAINS' 

INTRODUCTION OF LITERALS 
Figure 2 

DEFINE 'R:l .5+W! 

R 24.78 31.15 
25 31 


The function DEFINE compiles the definition provided by its argument ınto an 
appropriate del form, localizes any names which appear to the left of assignment arrows 
in the definition, provides a “trap” or “lock” appropriate to the particular APL system 
so that the function defined behaves like a primitive and cannot be suspended, and 
appends the original argument in a comment line for use by the function SHOW. 
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This approach makes it possible to introduce simple function definition very early and 
to use it in a variety of interesting contexts before introducing conditional and recursive 
definitions (also in the direct form), and the more difficult del form. 


4. Teaching Reading 


It is usually much easier to read and comprehend a sentence than it is to write a 
sentence expressing the same thought. Inductive teaching makes much use of such 
reading, and the student is encouraged to scan an entire passage, using pictures, context, 
and other clues, to grasp the overall theme before invoking the use of a dictionary to 
clarify details. 


Because the entry of an APL expression on a terminal immediately yields the overall 
result for examination by the student, this approach is particularly effective in teaching 
APL. For example, if the student’s workspace has a table of names of countries, and 
a table of oil imports by year by country by month, then the sequence: 


N+25 
B++/[ı]+/[3] OIL 


COUNTRIES." U" Ll+Bo (AB) ENS 


produces the following result, which has the obvious interpretation as a barchart of 
oil imports: 


ARABIA DDD ODO DDD OO dpao0D. .... 
NIGERIA DDD DD DD dOOddd0D....... 
CANADA  (JUUUUUUOUUUO0OO0.......... 
INDOWESIA sesion 
IRAN AAA 
LIBYA Eres sagas 
ALGEALA MEE: 
OTHER DODDOD OD DO 000000 


Moreover, because the simple syntax makes it easy to determine the exact sequence 
in which the parts of the sentence are executed, a detailed understanding of the 
expression can be gained by executing it piece-by-piece, as illustrated in Figure 3. 
Finally, such critical reading of an expression can lead the student to formulate his 
own definition of a useful related function as follows: 


DEFINE [| 
BARCHART:' .'[itwo.2((10a)ta)xf /u] 


5. Useful Expressions 


As remarked in Section 3, some expressions are too useful and important to be deferred 
to the point that would be dictated by the complexity of their structure. In APL such 
expressions can be handled by introducing them as defined functions whose use may 
be grasped immediately, but whose internal definition may be left for later study. 


For example, files can be introduced in terms of the functions GET, TO, RANGE, and 
REMOVE, illustrated in Figure 4. These can be grasped and used effectively by the 
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N The width of the barchart 


Q<«(1N) +N Numbers from 0 to 1 in 25 equal steps 
(display if desired) 
[ZB The largest value to be charted 
C+([/B)x@ Numbers from 0 to the largest value to be 
charted 


Sigs ua Cl Comparison of each value of B with 
each value in the range to be charted 
0:0 O 


PRR PP pa a y 


1 
1 
1 
1 
1 
1 
1 
1 


PPP PPP EE 
PPP PPP RE 
hahaha ER 
BPP PP PRE 
haha KH 
hahaha y 
ALTO O Es EA ES 
SOTO NO O ES ES ps 
HO GOOO Baja 
AO Or OO ES ER 
ROO0DOO0RRPp 
POoooorRrK»e 
HO O O O urn 
0.0.0 O O E 
4/00 0-0.0.014 
ED oD ie ads Ts O DO 


Examine a piece 
21 
a: 
11 


DETAILED EXECUTION OF AN EXPRESSION 


Figure 3 


student at an earlier stage and with much greater ease than can the underlying 
language elements from which they must be constructed in most APL systems. 


A further example is provided by the function needed to compile, display, and edit the 
character vectors used in direct definition of functions. For example, an editing function 
which deletes each position indicated by a slash, and inserts ahead of the position of 
the first comma any text which follows it (in the manner provided for del editing in 
many APL systems) is illustrated in Figure 5. 


Deferral of the internal details of the definition of these essential functions can, in fact, 
be turned to advantage, because they provide interesting exercises in reading (using the 
techniques of Section 4) the definitions of functions whose purposes are already clear 
from repeated use. For example, critical reading of the following definition of the 
function EDIT is very helpful in grasping the important idea of recursive definition: 


EDIT:EDIT(A DELETE Ktw),(1YKyA) „(K+-+/A\AzZ'! ,')4w:0=p AH, 00kw: w 
DELETE: (~(pw)t'/t=a)/w 


Analysis of the complete set of functions provided for the compilation from direct 
definition form also provides an interesting exercise in reading, but one which would 
not be completed, or perhaps even attempted, until after completion of an introductory 
course. Extensive leads to other interesting reading, of both workspaces and published 
material, should be given the student to encourage further growth after the conclusion 
of formal course work. 
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If the first dimension of an array (list, table, or list of tables) has the value 7, (for 
example, 1tpOIL is 7), then it may be distributed to N items of a file by a single 


operation. For example: 


072.20. VIMPORTO TZ ES YE IS AG: TB 


*Use the function GET to retrieve individual items from the IMPORTS file to verify the 
effect of the preceding expression. 


COUNTRIES TO ‘IMPORTS 1'Non-numeric data may be entered 


The functions RANGE and REMOVE are useful in managing files: 


RANGE '*IMPORTS! Gives range of indices 
L TA TS ME Vo O OS 


REMOVE ‘IMPORTS 73 75 77' Removes odd years 


RANGE ‘IMPORTS! 
1 72 74 76 78 


FUNCTIONS FOR USING FILES 


Figure 4 


TEXT+'DDELLLETN AND INSRTION' 
Z+EDIT TEXT Apply EDIT to erroneous text 
DDELLLETN AND INSRTION Line printed by the function 
Eh. aO Line entered on keyboard 
DELETION AND INSRTION Line printed by the function 
¿E Line entered on keyboard 
DELETION AND INSERTION Line printed by the function 
Empty line entered on keyboard (carriage 
return alone) ends execution of EDIT 


DEFINE 'REVISE:DEFINE EDIT SHOW w' Define a function for revision 


REVISE ‘SUM! 
SUM:+/LaJw 
/// „MAX 
MAX :+/[LaJw 
Ba 
MAX: | /Lalw 


FUNCTIONS FOR EDITING AND REVISION 


Figure 5 
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Advanced Courses 


Advanced language courses can also employ the inductive method, but the greater the 
student's mastery of a language, the greater the potential benefits of the deductive 
approach and of explicit analysis of the structure of the language. A point sometimes 
made in the advanced treatment of natural languages is that grammar and related 
matters can now be discussed in the target language, avoiding distractions and 
distortions which might be introduced by use of the mother tongue. 


Similar remarks apply to advanced APL courses. In particular, the use of APL in its 
own discussion and in the introduction of the more complex functions is quite 
productive. For example, reduction is very useful in discussing the inner product, and 
inner product and grade are helpful in analyzing dyadic transpose. 


Conduct Of The Course 


The introductory course on which these remarks are based evolved through four 
versions offered over a period of several months. The resulting course covers three 
contiguous days, and has been offered a number of times in the final form. 


Most students appear to work better in pairs than when assigned individually to 
terminals. Because there are no lectures, each pair can work at their own pace. 
Observations and student comments show that they find it more stimulating than a 
lecture course, and tend to come early and work late. Moreover, they learn to consult 
manuals much more than in a lecture course, and exhibit a good deal of independence 
by the end of the three days. 
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